Interdomain Routing and BGP : Interdomain Routing and BGP The slides here were created by Timothy G. Griffin of AT&T and presented at a SIGCOMM 2001 Tutorial Session.
The slides have been edited by me (D. Szajda). Any errors introduced as a result of these minor changes are my responsibility.
An Introduction to Interdomain Routing and BGP : An Introduction to Interdomain Routing and BGP Timothy G. Griffin
griffin@research.att.com
http://www.research.att.com/~griffin/interdomain.html
SIGCOMM 2001 Tutorial Session
August 28, 2001
Caveat : Caveat These slides have been edited by me. Any errors introduced as a result of these minor changes are my responsibility.
Acknowledgements : Acknowledgements Thanks to Jay Borkenhagen, Randy Bush,
Anja Feldmann, Matt Grossglauser,
Madan Musuvathi, Jennifer Rexford,
Shubho Sen, and Jia Wang for many
helpful comments My opinions should not be taken to
represent AT&T policy Errors are my own
BGP Generalities : BGP Generalities BGP is a path-vector protocol
Generally, vectoring refers to creating routes without having global knowledge of the tolopology
Based on neighbors reported paths, you choose preferred path and report that to neighbors
BGP implements policy-based routing
talk of “minimum paths” is ill defined.
configure routers with route preferences
Policy Based Routing : Policy Based Routing Ways to configure routers that effect which routes get computed
Route preferences: e.g. don’t use any path going through AS X
Which destinations should not be reported to which neighbors?
How a path should be edited when passed to a particular neighbor
E.G. maybe add several AS numbers into path to discourage, but not prevent, others from using path
Policy Based Routing : Policy Based Routing Is there a better way?
R. Perlman supports link-state system plus source specified routes.
Link-state gives complete information
Sources can then determine routes that meet their specifications
Instead of influencing neighbors with advertisements, simply drop traffic you don’t want to forward
Common View of the Telco Network : Common View of the Telco Network Brick
Common View of the IP Network (Layer 3) : Common View of the IP Network (Layer 3)
What This Tutorial Is About : What This Tutorial Is About
Goal : Goal Understand how layer 3 connectivity
is maintained in the global Internet This tutorial will not say much about
the applications that exploit this connectivity.
It will be restricted to IPv4 unicast routing. Part I : The basics of interdomain routing and BGP
Part II : BGP in practice: Issues of Scale
Outline Part I : Outline Part I Forwarding vs. Routing
IP addressing
Autonomous Systems (basic units of interdomain routing)
The Border Gateway Protocol (BGP)
BGP fundamentals
BGP route attributes
Implementing policy with BGP
A wee bit of theory
Outline Part II : Outline Part II Scaling internal BGP
BGP table growth
Address aggregation vs. Multihoming
Growth in number of autonomous systems
Dynamics of BGP
Route flapping
BGP convergence
Rates of BGP updates
Best Effort Connectivity : Best Effort Connectivity This is the fundamental service provided
by Internet Service Providers (ISPs) All other IP services depend on connectivity:
DNS, email, VPNs, Web Hosting, … IP traffic 135.207.49.8 192.0.2.153
Routing vs. Forwarding : Routing vs. Forwarding R R R A B C D R1 R2 R3 R4 R5 E Net Nxt Hop R4
R3
R3
R4
Direct
R4 Net Nxt Hop A
B
C
D
E
default R2
R2
Direct
R5
R5
R2 Net Nxt Hop A
B
C
D
E
default R1
Direct
R3
R1
R3
R1 Default to
upstream
router A
B
C
D
E
default Forwarding: determine next hop
Routing: establish end-to-end paths Forwarding always works
Routing can be badly broken
How Are Forwarding Tables Populated to implement Routing? : How Are Forwarding Tables Populated to implement Routing? Statically Dynamically Routers exchange network reachability information using ROUTING PROTOCOLS. Routers use this to compute best routes Administrator
manually configures
forwarding table entries In practice : a mix of these.
Static routing mostly at the “edge” + More control
+ Not restricted to
destination-based
forwarding
- Doesn’t scale
- Slow to adapt to
network failures + Can rapidly adapt to changes
in network topology
+ Can be made to scale well
- Complex distributed algorithms
- Consume CPU, Bandwidth, Memory
- Debugging can be difficult
- Current protocols are destination-based
Routers Talking to Routers : Routers Talking to Routers Routing info Routing info Routing computation is distributed among routers within a routing domain
Computation of best next hop based on routing information is the most CPU/memory intensive task on a router
Routing messages are usually not routed, but exchanged via layer 2 between physically adjacent routers (internal BGP and multi-hop external BGP are exceptions)
Before We Go Any Further … : Before We Go Any Further … IP ROUTING PROTOCOLS DO NOT
DYNAMICALLY ROUTE AROUND
NETWORK CONGESTION IP traffic can be very bursty
Dynamic adjustments in routing typically operate more slowly than fluctuations in traffic load
Dynamically adapting routing to account for traffic load can lead to wild, unstable oscillations of routing system
Autonomous Routing Domains : Autonomous Routing Domains A collection of physical networks glued together
using IP, that have a unified administrative
routing policy. Campus networks
Corporate networks
ISP Internal networks
…
Autonomous Systems (ASes) : Autonomous Systems (ASes) An autonomous system is an autonomous routing domain
that has been assigned an Autonomous System Number (ASN).
AS Numbers (ASNs) : AS Numbers (ASNs) ASNs are 16 bit values. 64512 through 65535 are “private” Genuity: 1
MIT: 3
Harvard: 11
UC San Diego: 7377
AT&T: 7018, 6341, 5074, …
UUNET: 701, 702, 284, 12199, …
Sprint: 1239, 1240, 6211, 6242, …
… ASNs represent units of routing policy Currently over 11,000 in use.
Architecture of Dynamic Routing : Architecture of Dynamic Routing AS 1 AS 2 BGP EGP = Exterior Gateway Protocol IGP = Interior Gateway Protocol Metric based: OSPF, IS-IS, RIP,
EIGRP (cisco) Policy based: BGP The Routing Domain of BGP is the entire Internet OSPF EIGRP
Slide23 : Topology information is flooded within the routing domain
Best end-to-end paths are computed locally at each router.
Best end-to-end paths determine next-hops.
Based on minimizing some notion of distance
Works only if policy is shared and uniform
Examples: OSPF, IS-IS Each router knows little about network topology
Only best next-hops are chosen by each router for each destination network.
Best end-to-end paths result from composition of all next-hop choices
Does not require any notion of distance
Does not require uniform policies at all routers
Examples: RIP, BGP Link State Vectoring Technology of Distributed Routing
The Gang of Four : The Gang of Four
Many Routing Processes Can Run on a Single Router : Many Routing Processes Can Run on a Single Router Forwarding Table OSPF
Domain RIP
Domain BGP OS kernel Forwarding Table Manager
IPv4 Addresses are 32 Bit Values : IPv4 Addresses are 32 Bit Values IPv6 addresses have 128 bits
Classful Addresses : Classful Addresses 0nnnnnnn 10nnnnnn nnnnnnnn nnnnnnnn nnnnnnnn 110nnnnn hhhhhhhh hhhhhhhh hhhhhhhh hhhhhhhh hhhhhhhh hhhhhhhh n = network address bit h = host identifier bit Class A Class C Class B Leads to a rigid, flat, inefficient use of address space …
RFC 1519: Classless Inter-Domain Routing (CIDR) : RFC 1519: Classless Inter-Domain Routing (CIDR) IP Address : 12.4.0.0 IP Mask: 255.254.0.0 Use two 32 bit numbers to represent a network.
Network number = IP address + Mask Usually written as 12.4.0.0/15
Which IP Addresses are Covered by a Prefix? : Which IP Addresses are Covered by a Prefix? 12.4.0.0/15 12.5.9.16 12.7.9.16 12.5.9.16 is covered by prefix 12.4.0.0/15 12.7.9.16 is not covered by prefix 12.4.0.0/15
CIDR = Hierarchy in Addressing : CIDR = Hierarchy in Addressing
Classless Forwarding : Classless Forwarding Destination =12.5.9.16
-------------------------------
payload even better OK better best!
IP Address Allocation and Assignment: Internet Registries : IP Address Allocation and Assignment: Internet Registries IANA
www.iana.org
RFC 2050 - Internet Registry IP Allocation Guidelines
RFC 1918 - Address Allocation for Private Internets
RFC 1518 - An Architecture for IP Address Allocation with CIDR
ARIN
www.arin.org APNIC
www.apnic.org RIPE
www.ripe.org Allocate to National and local registries and ISPs
Addresses assigned to customers by ISPs
Nontransit vs. Transit ASes : Nontransit vs. Transit ASes ISP 1 ISP 2 Nontransit AS
might be a corporate
or campus network.
Could be a “content
provider” NET A Traffic NEVER
flows from ISP 1
through NET A to ISP 2
(At least not intentionally!) Internet Service
providers (often)
have transit
networks
Selective Transit : Selective Transit NET B NET C NET A provides transit
between NET B and NET C
and between NET D
and NET C NET A NET D NET A DOES NOT
provide transit
Between NET D
and NET B Most transit networks transit in a selective manner…
Customers and Providers : Customers and Providers Customer pays provider for access to the Internet provider customer
Customers Don’t Always Need BGP : Customers Don’t Always Need BGP provider customer Nail up default routes 0.0.0.0/0
pointing to provider. Nail up routes 192.0.2.0/24
pointing to customer 192.0.2.0/24 Static routing is the most common way of connecting an
autonomous routing domain to the Internet.
This helps explain why BGP is a mystery to many …
Customer-Provider Hierarchy : Customer-Provider Hierarchy IP traffic provider customer
The Peering Relationship : The Peering Relationship Peers provide transit between
their respective customers
Peers do not provide transit
between peers
Peers (often) do not exchange $$$ traffic
allowed traffic NOT
allowed
Peering Provides Shortcuts : Peering Provides Shortcuts Peering also allows connectivity between
the customers of “Tier 1” providers.
Peering Wars : Peering Wars Reduces upstream transit costs
Can increase end-to-end performance
May be the only way to connect your customers to some part of the Internet (“Tier 1”) You would rather have customers
Peers are usually your competition
Peering relationships may require periodic renegotiation
Peering struggles are by far the most
contentious issues in the ISP world!
Peering agreements are often confidential. Peer Don’t Peer
BGP-4 : BGP-4 BGP = Border Gateway Protocol
Is a Policy-Based routing protocol
Is the de facto EGP of today’s global Internet
Relatively simple protocol, but configuration is complex and the entire world can see, and be impacted by, your mistakes. 1989 : BGP-1 [RFC 1105]
Replacement for EGP (1984, RFC 904)
1990 : BGP-2 [RFC 1163]
1991 : BGP-3 [RFC 1267]
1995 : BGP-4 [RFC 1771]
Support for Classless Interdomain Routing (CIDR)
BGP Operations (Simplified) : BGP Operations (Simplified) Establish session on
TCP port 179 Exchange all
active routes Exchange incremental
updates AS1 AS2 While connection
is ALIVE exchange
route UPDATE messages BGP session
Four Types of BGP Messages : Four Types of BGP Messages Open : Establish a peering session.
Keep Alive : Handshake at regular intervals.
Notification : Shuts down a peering session.
Update : Announcing new routes or withdrawing previously announced routes. announcement
=
prefix + attributes values
BGP Attributes : BGP Attributes
Value Code Reference
----- --------------------------------- ---------
1 ORIGIN [RFC1771]
2 AS_PATH [RFC1771]
3 NEXT_HOP [RFC1771]
4 MULTI_EXIT_DISC [RFC1771]
5 LOCAL_PREF [RFC1771]
6 ATOMIC_AGGREGATE [RFC1771]
7 AGGREGATOR [RFC1771]
8 COMMUNITY [RFC1997]
9 ORIGINATOR_ID [RFC2796]
10 CLUSTER_LIST [RFC2796]
11 DPA [Chen]
12 ADVERTISER [RFC1863]
13 RCID_PATH / CLUSTER_ID [RFC1863]
14 MP_REACH_NLRI [RFC2283]
15 MP_UNREACH_NLRI [RFC2283]
16 EXTENDED COMMUNITIES [Rosen]
...
255 reserved for development
From IANA: http://www.iana.org/assignments/bgp-parameters This
tutorial
will cover
these
attributes Not all attributes
need to be present in
every announcement
Attributes are Used to Select Best Routes : Attributes are Used to Select Best Routes 192.0.2.0/24
pick me! 192.0.2.0/24
pick me! 192.0.2.0/24
pick me! 192.0.2.0/24
pick me! Given multiple
routes to the same
prefix, a BGP speaker
must pick at most
one best route
(Note: it could reject
them all!)
Two Types of BGP Neighbor Relationships : Two Types of BGP Neighbor Relationships External Neighbor (eBGP) in a different Autonomous System
Internal Neighbor (iBGP) in the same Autonomous System AS1 AS2 eBGP iBGP iBGP is routed (using IGP!)
iBGP Peers Must be Fully Meshed : iBGP Peers Must be Fully Meshed iBGP neighbors do not announce
routes received via iBGP to other iBGP
neighbors. iBGP is needed to avoid routing loops within an AS
Injecting external routes into IGP does not scale and causes BGP policy information to be lost
BGP does not provide “shortest path” routing
Is iBGP an IGP? NO!
BGP Next Hop Attribute : BGP Next Hop Attribute Every time a route announcement crosses an AS boundary, the Next Hop attribute is changed to the IP address of the border router that announced the route. AS 6431 AT&T Research 135.207.0.0/16
Next Hop = 12.125.133.90 AS 7018 AT&T AS 12654 RIPE NCC
RIS project 12.125.133.90 135.207.0.0/16
Next Hop = 12.127.0.121 12.127.0.121
Join EGP with IGP For Connectivity : Forwarding Table Forwarding Table Join EGP with IGP For Connectivity AS 1 AS 2 192.0.2.1 135.207.0.0/16 10.10.10.10 10.10.10.10 192.0.2.0/30 destination next hop 135.207.0.0/16
Next Hop = 192.0.2.1 192.0.2.0/30 135.207.0.0/16 destination next hop 10.10.10.10 + 192.0.2.0/30 10.10.10.10 Router IP address
Network address
(for AS 2) Arrows show direction
of route advertisement
Next Hop Often Rewritten to Loopback : Forwarding Table Forwarding Table Next Hop Often Rewritten to Loopback AS 1 AS 2 192.0.2.1 135.207.0.0/16 10.10.10.10 EGP 127.22.33.44 135.207.0.0/16 destination next hop 10.10.10.10 127.22.33.44 destination next hop 135.207.0.0/16 destination next hop 10.10.10.10 + 127.22.33.44 10.10.10.10 127.22.33.44 135.207.0.0/16
Next Hop = 192.0.2.1 135.207.0.0/16
Next Hop = 127.22.33.44 Loopback refers to
directing traffic back
to advertising router
Implementing Customer/Provider and Peer/Peer relationships : Implementing Customer/Provider and Peer/Peer relationships Enforce transit relationships
Outbound route filtering
Enforce order of route preference
provider < peer < customer
Two parts:
Import Routes : Import Routes From
peer From
peer From
provider From
provider From
customer From
customer
Export Routes : Export Routes To
peer To
peer To
customer To
customer To
provider From
provider provider route customer route peer route ISP route
How Can Routes be Colored?BGP Communities! : How Can Routes be Colored? BGP Communities! Very powerful
BECAUSE it
has no (predefined) meaning Community Attribute = a list of community values.
(So one route can belong to multiple communities) RFC 1997 (August 1996) Used for signaling
within and between
ASes
Communities Example : Communities Example 1:100
Customer routes
1:200
Peer routes
1:300
Provider Routes To Customers
1:100, 1:200, 1:300
To Peers
1:100
To Providers
1:100 AS 1 Import Export
Blackholes : 192.0.2.0/24 192.0.2.0/24 Accidental or malicious
announcement of your prefix
can blackhole your destinations
in large part of the Internet Need Filter Here! legitimate not legitimate Blackholes
Mars Attacks! : Mars Attacks! 0.0.0.0/0: default
10.0.0.0/8: private
172.16.0.0/12: private
192.168.0.0/16: private
127.0.0.0/8: loopbacks
128.0.0.0/16: IANA reserved
192.0.2.0/24: test networks
224.0.0.0/3: classes D and E
….. Martian list often includes Martian list gives
addresses that should
be considered suspect
in an advertisement
(See RFC 1149)
Import Routes (Revisited) : Import Routes (Revisited) From
peer From
peer From
provider From
provider From
customer From
customer provider route customer route peer route ISP route xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx Customer
address
filters cccccc cccccc cccccc potential
blackhole Martian
So Many Choices : So Many Choices Which route should
Frank pick to 13.13.0.0./16? AS 1 AS 2 AS 4 AS 3 13.13.0.0/16 Frank’s
Internet Barn
BGP Route Processing : BGP Route Processing Best Route
Selection Apply Import
Policies Best Route
Table Apply Export
Policies Install forwarding
Entries for best
Routes. Receive
BGP
Updates Best
Routes Transmit
BGP
Updates Apply Policy =
filter routes &
tweak attributes Based on
Attribute
Values IP Forwarding Table Apply Policy =
filter routes &
tweak attributes Open ended programming.
Constrained only by vendor configuration language
Tweak Tweak Tweak : Tweak Tweak Tweak For inbound traffic
Filter outbound routes
Tweak attributes on outbound routes in the hope of influencing your neighbor’s best route selection
For outbound traffic
Filter inbound routes
Tweak attributes on inbound routes to influence best route selection outbound
routes inbound
routes inbound
traffic outbound
traffic In general, an AS has more
control over outbound traffic
Route Selection Summary : Route Selection Summary Highest Local Preference Shortest ASPATH Lowest MED (Multi-Exit Discriminator attribute) i-BGP < e-BGP Lowest IGP cost
to BGP egress Lowest router ID traffic engineering Enforce relationships Throw up hands and
break ties
Back to Frank … : Back to Frank … AS 1 AS 2 AS 4 AS 3 13.13.0.0/16 local pref = 80 local pref = 100 local pref = 90 Higher Local
preference values
are more preferred Local preference only used in iBGP
Implementing Backup Links with Local Preference (Outbound Traffic) : Implementing Backup Links with Local Preference (Outbound Traffic) Forces outbound traffic to take primary link, unless link is down. AS 1 primary link backup link Set Local Pref = 100
for all routes received
from AS 1 AS 65000 Set Local Pref = 50
for all routes received
from AS 1 We’ll talk about inbound traffic soon …
Multihomed Backups (Outbound Traffic) : Multihomed Backups (Outbound Traffic) Forces outbound traffic to take primary link, unless link is down. AS 1 primary link backup link Set Local Pref = 100
for all routes received
from AS 1 AS 2 Set Local Pref = 50
for all routes received
from AS 3 AS 3 provider provider
ASPATH Attribute : ASPATH Attribute AS7018 135.207.0.0/16
AS Path = 6341 AS 1755 Ebone AT&T AS 3549 Global Crossing 135.207.0.0/16
AS Path = 7018 6341 135.207.0.0/16
AS Path = 3549 7018 6341 AS 6341 135.207.0.0/16 AT&T Research Prefix Originated AS 12654 RIPE NCC
RIS project AS 1129 Global Access 135.207.0.0/16
AS Path = 7018 6341 135.207.0.0/16
AS Path = 1239 7018 6341 135.207.0.0/16
AS Path = 1755 1239 7018 6341 135.207.0.0/16
AS Path = 1129 1755 1239 7018 6341
Interdomain Loop Prevention : Interdomain Loop Prevention BGP at AS YYY will never accept a route with ASPATH containing YYY. AS 7018 12.22.0.0/16
ASPATH = 1 333 7018 877 Don’t Accept! AS 1
Traffic Often Follows ASPATH : Traffic Often Follows ASPATH AS 4 AS 3 AS 2 AS 1 135.207.0.0/16 135.207.0.0/16
ASPATH = 3 2 1 IP Packet
Dest =
135.207.44.66
… But It Might Not : … But It Might Not AS 4 AS 3 AS 2 AS 1 135.207.0.0/16 135.207.0.0/16
ASPATH = 3 2 1 IP Packet
Dest =
135.207.44.66 AS 5 135.207.44.0/25
ASPATH = 5 135.207.44.0/25 AS 2 filters all
subnets with masks
longer than /24 135.207.0.0/16
ASPATH = 1 From AS 4, it
may look like this
packet will take
path 3 2 1, but it
actually takes
path 3 2 5
Shorter Doesn’t Always Mean Shorter : In fairness:
could you do
this “right” and
still scale?
Exporting internal
state would
dramatically
increase global
instability and
amount of routing
state Shorter Doesn’t Always Mean Shorter AS 4 AS 3 AS 2 AS 1 Mr. BGP says that
path 4 1 is better
than path 3 2 1 Duh!
Shedding Inbound Traffic with ASPATH Padding Hack : Shedding Inbound Traffic with ASPATH Padding Hack Padding will (usually)
force inbound
traffic from AS 1
to take primary link AS 1 192.0.2.0/24
ASPATH = 2 2 2 customer AS 2 provider 192.0.2.0/24 backup primary 192.0.2.0/24
ASPATH = 2
Padding May Not Shut Off All Traffic : Padding May Not Shut Off All Traffic AS 1 192.0.2.0/24
ASPATH = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 customer AS 2 provider 192.0.2.0/24 192.0.2.0/24
ASPATH = 2 AS 3 provider AS 3 will send
traffic on “backup”
link because it prefers
customer routes and local
preference is considered
before ASPATH length!
Padding in this way is often
used as a form of load
balancing backup primary
COMMUNITY Attribute to the Rescue! : COMMUNITY Attribute to the Rescue! AS 1 customer AS 2 provider 192.0.2.0/24 192.0.2.0/24
ASPATH = 2 AS 3 provider backup primary 192.0.2.0/24
ASPATH = 2
COMMUNITY = 3:70 Customer import policy at AS 3:
If 3:90 in COMMUNITY then
set local preference to 90
If 3:80 in COMMUNITY then
set local preference to 80
If 3:70 in COMMUNITY then
set local preference to 70 AS 3: normal
customer local
pref is 100,
peer local pref is 90
Hot Potato Routing: Go for the Closest Egress Point : Hot Potato Routing: Go for the Closest Egress Point 192.44.78.0/24 15 56 IGP distances egress 1 egress 2 This Router has two BGP routes to 192.44.78.0/24. Hot potato: get traffic off of your network as
Soon as possible. Go for egress 1!
Getting Burned by the Hot Potato : Getting Burned by the Hot Potato 15 56 17 2865 High bandwidth
Provider backbone Low bandwidth
customer backbone Many customers want
their provider to
carry the bits! tiny http request huge http reply SFF NYC San Diego
Cold Potato Routing with MEDs(Multi-Exit Discriminator Attribute) : Cold Potato Routing with MEDs (Multi-Exit Discriminator Attribute) 15 56 17 2865 192.44.78.0/24 192.44.78.0/24
MED = 15 192.44.78.0/24
MED = 56 This means that MEDs must be considered BEFORE
IGP distance! Prefer lower
MED values Note1 : some providers will not listen to MEDs Note2 : MEDs need not be tied to IGP distance
Route Selection Summary : Route Selection Summary Highest Local Preference Shortest ASPATH Lowest MED i-BGP < e-BGP Lowest IGP cost
to BGP egress Lowest router ID traffic engineering Enforce relationships Throw up hands and
break ties This is somewhat simplified. Hey, what happened to ORIGIN??
Policies Can Interact Strangely(“Route Pinning” Example) : Policies Can Interact Strangely (“Route Pinning” Example) backup Disaster strikes primary link
and the backup takes over Primary link is restored but some
traffic remains pinned to backup 1 2 3 4 Install backup link using community customer
News At 11:00 : News At 11:00 BGP is not guaranteed to converge on a stable routing. Policy interactions could lead to “livelock” protocol oscillations. See “Persistent Route Oscillations in Inter-domain Routing” by K. Varadhan, R. Govindan, and D. Estrin. ISI report, 1996
Corollary: BGP is not guaranteed to recover from network failures.
What Problem is BGP solving? : What Problem is BGP solving? aid in the design of policy analysis algorithms and heuristics
aid in the analysis and design of BGP and extensions
help explain some BGP routing anomalies
provide a fun way of thinking about the protocol X could A Wee Bit of Theory
Separate dynamic and static semantics : Separate dynamic and static semantics static
semantics dynamic
semantics See [Griffin, Shepherd, Wilfong]
An instance of the Stable Paths Problem (SPP) : 1 An instance of the Stable Paths Problem (SPP) 2 A graph of nodes and edges,
Node 0, called the origin,
For each non-zero node, a set of permitted paths to the origin. This set always contains the “null path”.
A ranking of permitted paths at each node. Null path is always least preferred. (Not shown in diagram) When modeling BGP : nodes represent
BGP speaking routers, and 0 represents
a node originating some address block most preferred
…
least preferred (not null) Yes, the translation
gets messy!
A Solution to a Stable Paths Problem : 1 A Solution to a Stable Paths Problem 2 2 1 0
2 0 1 3 0
1 0 3 0 4 2 0
4 3 0 node u’s assigned path is either the null path or is a path uwP, where wP is assigned to node w and {u,w} is an edge in the graph,
each node is assigned the highest ranked path among those consistent with the paths assigned to its neighbors.
There may be many paths consistent with those assigned to neighbors. This condition limits which may be chosen A Solution need not represent
a shortest path tree, or
a spanning tree. A solution is an assignment of
permitted paths to each node
such that red path is
the assigned
path
An SPP may have multiple solutions : An SPP may have multiple solutions First solution 1 2 0
1 0 2 1 0
2 0 1 2 0
1 0 2 1 0
2 0 1 2 0
1 0 2 1 0
2 0 Second solution DISAGREE
BAD GADGET : No Solution : BAD GADGET : No Solution Why bad? Well…
if 1 uses (1,3,0) then 2 must use path (2,0) (it can’t use (2,1,0)) and 3 must use (3,0), in which case 3 ends up assigned a consistent path that is NOT the highest ranked among consistent paths.
if 1 uses (1,0), then 2 can use either (2,1,0) or (2,0). If 2 uses (2,1,0) then 3 must use (3,0), which means that the route (1,0) of 1 is consistent but not highest ranked. If 2 instead uses (2,0), then 3 will use (3,2,0), so the route (2,1,0) of 2 will be consistent with assignments to neighbors, but is not assigned to 2, creating a violation of the solution properties.
SURPRISE : Beware of Backup Policies : SURPRISE : Beware of Backup Policies 2 0 3 1 2 1 0
2 0 1 3 0
1 0 3 4 2 0
3 0 4 4 0
4 2 0
4 3 0 Becomes a BAD GADGET if link
(4, 0) goes down. BGP is not robust :
it is not guaranteed
to recover from
network failures.
PRECARIOUS : PRECARIOUS Has a solution, but can get “trapped”
Part II : Part II Issues of scale for BGP in the real world
Big and Getting Bigger : Big and Getting Bigger Scaling the iBGP mesh
Confederations
Route Reflectors
BGP Table Growth
Address aggregation (CIDR)
Address allocation
AS number allocation and use
Dynamics of BGP
Inherent vs. accidental oscillation
Rate limiting and route flap dampening
Lots and lots of noise
Slow convergence time Scale
Scale
Scale
Scale
Scale
Scale
Scale
Scale
Scale
Scale
Scale
Scale
Scale
iBGP Mesh Does Not Scale : iBGP Mesh Does Not Scale eBGP update N border routers means N(N-1)/2 peering sessions
Each router must have N-1 iBGP sessions configured
The addition of a single iBGP speaker requires configuration changes to all other iBGP speakers
Size of iBGP routing table can be order N larger than number of best routes (remember alternate routes!)
Each router has to listen to update noise from each neighbor
Currently four solutions:
(0) Buy bigger routers!
Break AS into smaller ASes
BGP Route reflectors
BGP confederations
Route Reflectors : Route reflectors can pass on iBGP updates to clients
Each RR passes along ONLY best routes
ORIGINATOR_ID and CLUSTER_LIST attributes are needed to avoid loops RR RR RR Route Reflectors
BGP Confederations : BGP Confederations AS 65501 AS 65502 AS 65503 AS 65504 AS 65500 AS 1 From the outside, this looks like AS 1 Confederation eBGP (between member ASes) preserves
LOCAL_PREF, MED, and BGP NEXTHOP.
BGP Table Growth : BGP Table Growth Thanks to Geoff Huston. http://www.telstra.net/ops/bgptable.html on August 8, 2001
Large BGP Tables Considered Harmful : Large BGP Tables Considered Harmful Routing tables must store best routes and alternate routes
Burden can be large for routers with many alternate routes (route reflectors for example)
Routers have been known to die
Increases CPU load, especially during session reset Moore’s Law may save us in theory. But
in practice it means spending money to upgrade
equipment …
Deaggregation Due to Multihoming May be a Leading Cause : Deaggregation Due to Multihoming May be a Leading Cause AS 1 customer AS 2 provider 12.0.0.0/8 AS 3 provider 12.2.0.0/16 12.2.0.0/16 12.2.0.0/16 If AS 1 does
not announce the
more specific prefix,
then most traffic
to AS 2 will go
through AS 3
because it is a
longer match AS 2 is
“punching a hole” in
The CIDR block of AS 1
How Many ASNs are there? : How Many ASNs are there? Thanks to Geoff Huston. http://www.telstra.net/ops on June 23, 2001
When will we run out of ASNs? : 64,511 2005? 2007? When will we run out of ASNs?
What is to be done? : What is to be done? Make ASNs larger than 16 bits
How about 32 bits?
See Internet Draft: “BGP support for four-octet AS number space” (draft-ietf-idr-as4bytes-03.txt)
Requires protocol change and wide deployment
Change the way ASNs are used
Allow multihomed, non-transit networks to use private ASNs
Use ASE (AS number Substitution on Egress )
See Internet Draft: “Autonomous System Number Substitution on Egress” (draft-jhaas-ase-00.txt)
Works at edge, requires protocol change (for loop prevention)
Makes some kinds of debugging harder!
Multihomed and “Private”! (draft-jhaas-ase-00.txt) : Multihomed and “Private”! (draft-jhaas-ase-00.txt) In fairness:
could you “do this
right” and still scale? AS 2 AS 1 AS 65535 63.63.63.0/24 AS 3 Replace
private ASN Choice of “private” ASN requires a bit of additional coordination between providers A non-transit network ASE-ORIGINATOR is a new attribute needed for “sender side loop detection” at AS 1 and 2
BGP Routing Tables : BGP Routing Tables
Use “whois” queries to associate an ASN with “owner” (for example, http://www.arin.net/whois/arinwhois.html)
7018 = AT&T Worldnet, 701 =Uunet, 3561 = Cable & Wireless, …
Hey, we can use these paths to draw cool graphs!
show ip bgp
BGP table version is 111849680, local router ID is 203.62.248.4
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
. . .
*>i192.35.25.0 134.159.0.1 50 0 16779 1 701 703 i
*>i192.35.29.0 166.49.251.25 50 0 5727 7018 14541 i
*>i192.35.35.0 134.159.0.1 50 0 16779 1 701 1744 i
*>i192.35.37.0 134.159.0.1 50 0 16779 1 3561 i
*>i192.35.39.0 134.159.0.3 50 0 16779 1 701 80 i
*>i192.35.44.0 166.49.251.25 50 0 5727 7018 1785 i
*>i192.35.48.0 203.62.248.34 55 0 16779 209 7843 225 225 225 225 225 i
*>i192.35.49.0 203.62.248.34 55 0 16779 209 7843 225 225 225 225 225 i
*>i192.35.50.0 203.62.248.34 55 0 16779 3549 714 714 714 i
*>i192.35.51.0/25 203.62.248.34 55 0 16779 3549 14744 14744 14744 14744 14744 14744 14744 14744 i
. . . Thanks to Geoff Huston. http://www.telstra.net/ops on July 6, 2001
AS Graphs Can Be Fun : AS Graphs Can Be Fun The subgraph showing all ASes that have more than 100 neighbors in full
graph of 11,158 nodes. July 6, 2001. Point of view: AT&T route-server
AS Graphs Depend on Point of View : AS Graphs Depend on Point of View This explains why there is no UUNET (701) Sprint (1239) link on previous slide! peer peer customer provider 5 4 6 1 3 5 4 2 6 1 3 2
AS Graphs Do Not Show Topology! : AS Graphs Do Not Show Topology! BGP was designed to
throw away information!
BGP Dynamics : BGP Dynamics How many updates are flying around the Internet?
How long Does it take Routes to Change? The goals of
(1) fast convergence
(2) minimal updates
(3) path redundancy
are at odds
Daily Update Count : Daily Update Count
What is the Sound of One Route Flapping? : What is the Sound of One Route Flapping?
A Few Bad Apples … : A Few Bad Apples … Thanks to Madanlal Musuvathi for this plot. Data source: RIPE NCC Typically, 80% of
the updates are
for less than 5%
Of the prefixes. Most prefixes are
stable most of the time.
On this day, about 83% of the prefixes were not updated. Percent of BGP table prefixes
Two BGP Mechanisms for Squashing Updates : Two BGP Mechanisms for Squashing Updates Rate limiting on sending updates
Send batch of updates every MinRouteAdvertisementInterval seconds (+/- random fuzz)
Default value is 30 seconds
A router can change its mind about best routes many times within this interval without telling neighbors
Route Flap Dampening
Punish routes for misbehaving Effective in
dampening
oscillations
inherent in the
vectoring
approach Must be turned on
with configuration
30 Second Bursts : 30 Second Bursts
How Long Does BGP Take to Adapt to Changes? : How Long Does BGP Take to Adapt to Changes? Thanks to Abha Ahuja and Craig Labovitz for this plot.
Two Main Factors in Delayed Convergence : Two Main Factors in Delayed Convergence Rate limiting timer slows everything down
BGP can explore many alternate paths before giving up or arriving at a new path
No global knowledge in vectoring protocols
Why is Rate Limiting Needed? : Why is Rate Limiting Needed? Updates
to convergence MinRouteAdvertisementInterval 0 Time
to convergence MinRouteAdvertisementInterval 0 SSFNet (www.ssfnet.org) simulations, T. Griffin and B.J. Premore.
To appear in ICNP 2001. Rate limiting dampens some of the
oscillation inherent in a vectoring protocol. Current interval (30 seconds) was picked “out of the blue sky” (yet “impact on BGP convergence time is tremendous”)
Route Flap Dampening (RFC 2439) : Route Flap Dampening (RFC 2439) Routes are given a penalty for changing.
If penalty exceeds suppress limit, the
route is dampened. When the route is not changing,
its penalty decays exponentially. If the penalty goes
below reuse limit, then it is announced again. Can dramatically reduce the number of BGP updates
Requires additional router resources
Applied on eBGP inbound only
Route Flap Dampening Example : Route Flap Dampening Example penalty for each flap = 1000
Q: Why All the Updates? : Q: Why All the Updates? Networks come, networks go
There’s always a router rebooting somewhere
Hardware failure, flaky interface cards, backhoes digging, floods in Houston, … This is “normal” --- exactly what
dynamic routing is designed for…
Q: Why All the Updates? : Q: Why All the Updates? Misconfiguration
Route flap dampening not widely used
BGP exploring many alternate paths
Software bugs in implementation of routing protocols
BGP session resets due to congestion or lack of interoperability: BGP sessions are brittle. One malformed update is enough to reset session and flap 100K routes. (Consequence of incremental approach)
IGP instability exported by use of MEDs or IGP tie breaker
Sub-optimal vendor implementation choices
Secret sauce routing algorithms attempting fancy-dancy tricks
Weird policy interactions (MED oscillation, BAD GADGETS??)
Gnomes, sprites, and fairies
….
A: NO ONE REALLY KNOWS …
IGP Tie Breaking Can Export Internal Instability to the Whole Wide World : IGP Tie Breaking Can Export Internal Instability to the Whole Wide World 15 56 192.44.78.0/24 AS 4 AS 3 AS 2 AS 1 10 FLAP FLAP FLAP FLAP 192.44.78.0/24
ASPATH = 4 2 1 192.44.78.0/24
ASPATH = 4 3 1
MEDs Can Export Internal Instability : MEDs Can Export Internal Instability 15 17 2865 192.44.78.0/24 192.44.78.0/24
MED = 15 192.44.78.0/24
MED = 56 OR 10 56 10 FLAP FLAP FLAP FLAP FLAP
FLAP
Implementation Does Matter! : Implementation Does Matter! Thanks to Abha Ahuja and Craig Labovitz for this plot. stateless withdraws
widely deployed stateful withdraws
widely deployed
How Long Will Interdomain Routing Continue to Scale? : How Long Will Interdomain Routing Continue to Scale? ... the existing interdomain routing
infrastructure is rapidly nearing the
end of its useful lifetime. It
appears unlikely that mere tweaks
of BGP will stave off fundamental
scaling issues, brought on by growth,
multihoming and other causes. A quote from some recent email: Is this true or false? How can we tell? Research required…
Summary : Summary
BGP is a fairly simple protocol …
… but it is not easy to configure
BGP is running on more than 100K routers (my estimate), making it one of world’s largest and most visible distributed systems
Global dynamics and scaling principles are still not well understood
Addressing and ASN RFCs : Addressing and ASN RFCs
RFC 1380 IESG Deliberations on Routing and Addressing (1992)
RFC 1517Applicability Statement for the Implementation of Classless Inter- Domain Routing (CIDR) (1993)
RFC 1518 An Architecture for IP Address Allocation with CIDR (1993)
RFC 1519 Classless Inter-Domain Routing (CIDR) (1993)
RFC 1467 Status of CIDR Deployment in the Intrenet (1983)
RFC 1520 Exchanging Routing Information Across Provider Boundaries in the CIDR Environment (1993)
RFC 1817 CIDR and Classful routing (1995)
RFC 1918 Address Allocation for Private Internets (1996)
RFC 2008 Implications of Various Address Allocation Policies for Internet Routing (1996)
RFC 2050 Internet Registry IP Allocation Guidelines (1996)
RFC 2260 Scalable Support for Multi-homed Multi-provider Connectivity (1998)
RFC 2519 A Framework for Inter-Domain Route Aggregation (1999)
RFC 1930 Guidelines for creation, selection, and registration of an Autonomous System (AS)
RFC 2270 Using a Dedicated AS for Sites Homed to a Single Provider
Selected BGP RFCs : Selected BGP RFCs IDR : http://www.ietf.org/html.charters/idr-charter.html
RFC 1771 A Border Gateway Protocol 4 (BGP-4)
Latest draft rewrite: draft-ietf-idr-bgp4-12.txt
RFC 1772 Application of the Border Gateway Protocol in the Internet
RFC 1773 Experience with the BGP-4 protocol
RFC 1774 BGP-4 Protocol Analysis
RFC 2796 BGP Route Reflection An alternative to full mesh IBGP
RFC 3065 Autonomous System Confederations for BGP
RFC 1997 BGP Communities Attribute
RFC 1998 An Application of the BGP Community Attribute in Multi-home Routing
RFC 2439 Route Flap Dampening
Internet Engineering Task Force (IETF) http://www.ietf.org
Titles of Some Recent Internet Drafts : Titles of Some Recent Internet Drafts Dynamic Capability for BGP-4
Application of Multiprotocol BGP-4 to IPv4 Multicast Routing
Graceful Restart mechanism for BGP
Cooperative Route Filtering Capability for BGP-4
Address Prefix Based Outbound Route Filter for BGP-4
Aspath Based Outbound Route Filter for BGP-4
Architectural Requirements for Inter-Domain Routing in the Internet
BGP support for four-octet AS number space
Autonomous System Number Substitution on Egress
BGP Extended Communities Attribute
Controlling the redistribution of BGP routes
BGP Persistent Route Oscillation Condition
Benchmarking Methodology for Basic BGP Convergence
Terminology for Benchmarking External Routing Convergence Measurements
BGP is a moving target …
Selected Bibliography on Routing : Selected Bibliography on Routing Internet Routing Architectures. Bassam Halabi. Second edition Cisco Press, 2000
BGP4: Inter-domain Routing in the Internet. John W. Stewart, III. Addison-Wesley, 1999
Routing in the Internet. Christian Huitema. 2000
ISP Survival Guide: Strategies for Running a Competitive ISP. Geoff Huston. Wiley, 1999.
Interconnection, Peering and Settlements. Geoff Huston. The Internet Protocol Journal. March and June 1999.
BGP Stability and Convergence : BGP Stability and Convergence The Impact of Internet Policy and Topology on Delayed Routing Convergence. Craig Labovitz, Abha Ahuja, Roger Wattenhofer, Srinivasan Venkatachary. INFOCOM 2001
An Experimental Study of BGP Convergence. Craig Labovitz, Abha Ahuja, Abhijit Abose, Farnam Jahanian. SIGCOMM 2000
Origins of Internet Routing Instability. C. Labovitz, R. Malan, F. Jahanian. INFOCOM 1999
Internet Routing Instability. Craig Labovitz, G. Robert Malan and Farnam Jahanian. SIGCOMM 1997
Analysis of Interdomain Routing : Analysis of Interdomain Routing Cooperative Association for Internet Data Analysis (CAIDA)
http://www.caida.org/
Tools and analyses promoting the engineering and maintenance of a robust, scalable global Internet infrastructure
Internet Performance Measurement and Analysis (IPMA)
http://www.merit.edu/ipma/
Studies the performance of networks and networking protocols in local and wide-area networks
National Laboratory for Applied Network Research (NLANR)
http://www.nlanr.net/
Analysis, tools, visualization.
IRTF Routing Research Group (IRTF-RR)
http://puck.nether.net/irtf-rr/
Internet Route Registries : Internet Route Registries Internet Route Registry
http://www.irr.net/
Routing Policy Specification Language (RPSL)
RFC 2622 Routing Policy Specification Language (RPSL)
RFC 2650 Using RPSL in Practice
Internet Route Registry Daemon (IRRd)
http://www.irrd.net/
RAToolSet
http://www.isi.edu/ra/RAToolSet/
Some BGP Theory : Some BGP Theory Persistent Route Oscillations in Inter-Domain Routing. Kannan Varadhan, Ramesh Govindan, and Deborah Estrin. Computer Networks, Jan. 2000. (Also USC Tech Report, Feb. 1996)
Shows that BGP is not guaranteed to converge
An Architecture for Stable, Analyzable Internet Routing. Ramesh Govindan, Cengiz Alaettinoglu, George Eddy, David Kessens, Satish Kumar, and WeeSan Lee. IEEE Network Magazine, Jan-Feb 1999.
Use RPSL to specify policies. Store them in registries. Use registry for conguration generation and analysis.
An Analysis of BGP Convergence Properties. Timothy G. Griffin, Gordon Wilfong. SIGCOMM 1999
Model BGP, shows static analysis of divergence in policies is NP complete
Policy Disputes in Path Vector Protocols. Timothy G. Griffin, F. Bruce Shepherd, Gordon Wilfong. ICNP 1999
Define Stable Paths Problem and develop sufficient condition for “sanity”
A Safe Path Vector Protocol. Timothy G. Griffin, Gordon Wilfong. INFOCOM 2001
Dynamic solution for SPVP based on histories
Stable Internet Routing without Global Coordination. Lixin Gao, Jennifer Rexford. SIGMETRICS 2000
Show that if certain guidelines are followed, then all is well.
Inherently safe backup routing with BGP. Lixin Gao, Timothy G. Griffin, Jennifer Rexford. INFOCOM 2001
Use SPP to study complex backup policies
Thank You! : Thank You! Companion links:
http://www.research.att.com/~griffin/interdomain.html