Acknowledgements: A lot of this material is based upon the fine introductory book: "An Engineering Approach to Computer Networking," by Srinivasan Keshav, Addison-Wesley. See the link to it on the suggested texts area on the course homepage. ================================================================== 1. What is routing ? - Routing is the process of finding a path from a source to every destination in the network. - Operationally, it means a protocol to set up forwarding tables at all intermediate nodes in the network. More carefully defined, it is the function of setting up: - (layer 3) forwarding tables at routers (intermediate nodes) such that, - with a high probability, a (loop free) path from any source to any destination can be found - by concatenation of links and forwarding (intra-router links), - and such a path satisfies a well-defined optimization criterion ================================================================== 2. How does it differ from forwarding ? - Given a packet at a node, finding which output port it needs to go to is called "forwarding." - It is a per-node function. You need a forwarding table for any non trivial forwarding. Forwarding can also be imagined as setting up a link from one port to another within a node. ================================================================== 3. Can you explain the "careful definition" of routing even more carefully ? Sure ! - We instinctively think of a "path" as having multiple hops. Graph theory tells us that a "path" cannot have a loop (nor can a tree for that matter) - Routing is trivial if it is a single hop -- forwarding is enough. For example - routing is trivial on a broadcast LAN or a fully connected mesh. Observe that you do not need even "intermediate nodes" in this topology. Routing is also trivial in a ring topology - esp a unidirectional one. - The first time you require control intelligence is when you require a forwarding table - a multi hop network. But "learning bridges" provide simple ways of constructing a forwarding table. But this is layer-2 forwarding. We know layer-2 ends at the boundary of the broadcast domain -- so if you want to cross that boundary -- you need layer 3 forwarding (forwarding tables and hence routing). - Note that you could get by without routing and a forwarding table by forwarding a packet to a random output port and praying that it will reach the destination. But this is obviously inefficient, and you might run into loops. Routing aims for loop-free paths with a high probability. - Routing should first minimize the probability of looping, and then should build paths such that some optimization criterion is satisfied -- eg a least # of hops path is built by the Dijkstra's algorithm. [More specifically, forwarding is a "local" operation whereas the forwarding table must be setup based upon knowledge of the global topology. So a routing protocol must communicate global topological information to each forwarding node to allow it to make a local routing decision (input port to output port)] Miscl: - By definition, routing is a "control path" function, not a "data path" function. - a "path from source to destination" may be setup dynamically on a per-connection basis (eg: virtual circuit routing) or in the background using a distributed algorithm. Random routing is example of per-packet routing -- not efficient. ================================================================== 4. What are the design constraints in routing ? - Minimizing routing table space: the routing table should grow more slowly than the number of destinations in the network. Since hashing and tree structures are used for route lookup, the lookup complexity is dependent upon the size of the table. This is why we need aggregation - so that worst case route table sizes become proportional to the smaller address space. - Minimizing control messages: The amount of control messages also depends upon the number of destinations (and the route table size). This is one of the more pressing scalability reasons why aggregation is needed. - Robustness: the probability of routing loops, routing oscillations and misrouting ("black holes") should be minimal. Authentication, checksumming, other consistency checks may be required to protect against corrupted routing tables, corrupted packets, fake peers etc. - Optimal paths: routing protocols may try to optimize several metrics (distance, cost, resources used for QoS, policy) etc. It is important that the routers which cooperate to achieve routing have consistent views of the topology in terms of these metrics and use consistent interpretations to setup forwarding tables. ================================================================== 5. Ok, so what are the types of routing out there ? a) Centralized vs distributed: - For small domains which are centrally administered, a server could collect topology and find routes and send the tables to each node. - Centralized routing does not scale and is a single point of failure. Most routing in reasonably large domains and internets is distributed. b) Source-based vs hop-by-hop: - Source-based: A source may specify entire route. Also called explicit routing. Loose-source routing => source specifies partial route through which packet must pass through. Supported in IPv4 and v6 as options. - source based routing suffers from unbounded header size; and requires the source to be aware of topological changes in the network. - Internet routing is hop by hop where the packet carries the destination address and the routers at every hop determine forwarding based upon the destination address and routing information setup by a routing protocol. - But source routing is not unreasonable if the route is setup once (eg in VCs), and short labels (instead of long addresses) are assigned and used at every hop (eg ATM). In fact, explicit routing is desired by large ISPs who would like to pin routes for delivering QoS, define large traffic aggregates called "trunks" and map them to specific paths -- this area is known as "traffic engineering" c) Stochastic vs deterministic: - Deterministic => forwards packets to destination (or packets of a given flow/trunk) along exactly one path. - Stochastic => randomly choose among a set of paths to forward packets. While this may provide some load balancing, the benifit is outweighed by the common case problem of out-of-order delivery. As a result modern networks use deterministic routing. d) Single vs multiple path: - In multiple-path (typically in telephony and soon to be introduced in core domains of the Internet), traffic trunks may go through a primary path or a set of alternative paths. This approach requires more routing table space and Internet routing so far has used single path routing. e) Dynamic vs Static (aka State-dependent vs State independent): - State dependent = "dynamic" routing. Do not ignore network state. Can find better routes than static routing in the presence of changes in the network state, but can have oscillatory problems. - State independent = static: preconfigured routes f) Distance vector vs link state: - Distance vector and link state are two types of algorithms used in packet-switched networks. The reason for their popularity is that they allow a router to find *global* routing information by exchanging routing information with only its *neighbors*. In distance vector you exchange distances to every destination; whereas in link state you exchange distances to every neighbor (but this information is broadcast to every node in the network). - Distance vector routing is like setting up signposts at intersections of roadways (equivalent of a router in computer networks). You set up a signpost with an initial value and then update it by looking at your neighbors signposts. A common updation algorithm is called Bellman-Ford algorithm. - Link state routing is like getting an entire map at every intersection (i.e. a router) and plotting optimal routes based on that map, and then sending the packet along the next hop of the optimal route calculated. A common shortest path algorithm is called Dijkstra's algorithm. - Distance vector may converge slowly after link failures (called the "count to infinity problem"), but boasts minimal per-router state and computation complexity. Distance vectors approach using path-vectors (where the entire path to the destination is maintained) converges much faster, but maintains more state and has increased computational complexity. - Link state does not have the looping & convergence problems, but has a lot of complexity associated with disseminating the "network map" to all routers in a robust manner. It allows maintainence of multiple maps based upon different metrics, but the consistent use of maps during forwarding has to be managed through some other means. [More about distance vector vs link state is in the class handouts] ================================================================== 5. How does telephony routing work ? - Telephone routing is SIMPLE. - This is because historically the electromechanical relays used for network control could not execute sophisticated programs. Instead, the switches and links were made to be extremely reliable. - Further, the national telephone network is structured as a fairly rigid three-level hierarchy, where the levels represent subscriber telephones or modems, central office switches and long-distance or toll switching offices. The network of toll switches is called the "core". - Specifically, subscribers are directly connected to central office switches (eg: Lucent's 5ESS switches) through the local loop. Many central offices exist in a Local Exchange Carrier (LEC) such as Bell Atlantic. Every central office switch logically connects to one (or rarely, two) nodes at the core. - The switches in the core are fully connected (a mesh topology) at great cost! (in comparison Internet cores are sparsely connected). This full core connectivity combined with reliable components means that routing is not needed if you just desire connectivity ! - The reason routing is still needed is because a) a call also requires resources to be reserved, b) users expect low blocking probability - These tasks are also alleviated by the fact that telephone traffic is very predictable - arrivals are poisson and call duration has a 3 min average. Moreover, call patterns during every interval of every day of the year is known and they repeat with striking consistency. - So routes are precomputed and downloaded into switches for different intervals in a day using this knowledge ! - The default route in the core is always a one-hop logical path. The only complexity is to allow alternate routes to obtain a low blocking probability. To do this, the telephone system computes an ordered set of two-hop paths which are tried in the predefined order if the one-hop path is full! This is known as "Dynamic nonhierarchical routing (DNHR)." The process in which a call is rejected on a primary path and is retried on an alternative path is called "crankback" - which represents the only "dynamism" in the routing protocol ! - DNHR suffers from a problem known as "metastability" where, under heavy load, all calls alternate between using a two-hop path or a one hop path. To counter this a scheme called "Trunk status map routing (TSMR)" was developed where a centralized server would make minor modifications based upon measured dynamics if necessary. The latest in the series of such algorithms is "Real-time network routing (RTNR)" where the routing changes are more distributed (controlled by the ingress toll switch). RTNR is very effective: AT&T's network carries upto 260 million calls on busy days and only about one or two are blocked at the core !! Perspective: ============ A hundred billion $$ question is to use newer switching systems which use sparser interconnection, and reduce stringent requirements for connectivity and component reliability, but provide the features of low blocking probability, very high reliability and global coverage! ATM provides some of these facilities with flexible QoS routing. But what is dragging back operators is the complex switching, billing and signaling systems and not the routing algorithms. New long distance companies like Qwest (and some transforming companies like MCI Worldcom/UUnet) have the opportunity to use advanced Internet routing techniques to bypass the requirements mentioned above. But the challenge there is to develop the same level of availability and manageability. By throwing bandwidth at the problem (2 Tb/s according to Qwest using new fiber and WDM), several of these problems are simplified. The only hitch now, however, is that revenue emphasis is moving from voice to data services, i.e. the need is to provide QoS for data (which is not at all predictable!!). Diff-serv and MPLS are now coming to the rescue ... Note that in other countries, telephone is a monopoly, and the cost of the telephone system includes both the local loop and the long distance, and 80% of the total system cost is in the local loop, and 90% of that cost is in the labor of installation !!