Home

Troubleshooting BGP - PFS Internet Development

image

Contents

1. 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example IV e L2 upstream somewhere has poor connectivity between themselves and the rest of the Internet Only real solution is to impress upon upstream that this isn t good enough and get them to fix it Or change upstreams NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example IV e Route Flap Damping Many ISPs implement route flap damping Many ISPs simply use the vendor defaults Vendor defaults are generally far too severe There is real concern that the more lenient RIPE 229 values are too severe Opinion is growing that flap damping does more harm than good e g www cs berkeley edu zmao Papers sig02 pdf e Again Looking Glasses come to the operator s assistance NZNOG 2006 2006 Cisco Systems Inc All rights reserved gt http f oxide sprintlink net cai bin alass pl 0 Q O BE al Query Results fl 3sl bb20 sj gt sh ip bgp flap NOTE This command will be deprecated soon Please use show ip bgp dampening dampened paths flap statistics BGP table version is 87689246 local router ID is 144 228 241 64 Status codes s suppressed d damped h history valid gt best i internal r RIB failure S Stale Origin codes i IGP e EGP incomplete Network From Flaps Duration Reuse Path h 12 44 243 0 24 144 232 9 2 1 00 13 12 701 26144 h 12 1
2. e What do the logs say Problems are usually caused because BGP keepalives are lost No keepalive local router assumes remote has gone down so tears down the BGP session Then tries to re establish the session which succeeds Then tries to exchange UPDATEs fails keepalives get lost session falls over again WHY NZNOG 2006 2006 Cisco Systems Inc All rights reserved Flapping Peer Diagnosis and Solution e Diagnosis Keepalives get lost because they get stuck in the router s queue behind BGP update packets BGP update packets are packed to the size of the MTU keepalives and BGP OPEN packets are not packed to the size of the MTU Path MTU problems Use ping with different size packets to confirm the above 100byte ping succeeds 1500byte ping fails MTU problem somewhere e Solution Pass the problem to the L2 folks but be helpful try and pinpoint using ping where the problem might be in the network NZNOG 2006 2006 Cisco Systems Inc All rights reserved Flapping Peer Other Common Problems e Remote router rebooting continually typical with a 3 5 minute BGP peering cycle time e Remote router BGP process unstable restarting e Traffic Shaping amp Rate Limiting parameters e MTU incorrectly set on links PMTU discovery disabled on router e For non ATM FR links instability in the L2 point to point circuits Faulty MUXes bad connectors interoperability problem
3. 202 10 0 102 331 707 ms 322 102 ms 322 023 ms 21 gigabitethernetO l cor2 bri connect com au 203 63 11 82 322 028 ms 323 343 ms 323 508 ms 22 DWES1351845 8 gy connect com au 210 8 13 61 325 219 ms 323 865 ms 323 619 ms 23 gi0 l bri lnsl qld westnet com au 202 173 144 82 323 118 ms 323 777 ms 323 458 ms 24 dsl 202 173 147 216 qld westnet com au 202 173 147 216 337 079 ms 337 940 ms Troubleshooting Connectivity Example II e Help is at hand RouteViews e The RouteViews router has BGP feeds from around 60 peers www routeviews org explains the project Gives access to a real router and allows any provider to find out how their prefixes are seen in various parts of the Internet Complements the Looking Glass facilities e Anyway back to our problem NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example II e Checklist Does AS3 s upstream send it to AS3 We are checking eBGP configuration on AS3 s upstream There may be a configuration error with as path filters or prefix lists or communities such that only local prefixes get out This needs AS3 s assistance Does AS3 see any of AS1 s originated prefixes We are checking eBGP configuration on R3 Maybe AS3 does not know to expect the prefix from AS in the peering with its upstream or maybe it has some errors in as path or prefix or community filters NZNOG 2006 2006 Cisco Systems Inc All r
4. 176 0 20 144 232 9 2 1 00 33 24 701 22351 4755 9829 gt 61 1 192 0 19 144 232 9 2 1 00 33 24 701 22351 4755 9829 gt 61 2 208 0 20 144 232 9 2 1 00 33 24 701 22351 4755 9829 gt 61 3 224 0 20 144 232 9 2 1 00 33 24 701 22351 4755 9829 62 24 32 0 22 144 232 9 2 2 00 33 45 701 22351 62 24 36 0 24 144 232 9 2 2 00 33 45 701 22351 v Troubleshooting Connectivity Example IV e Most Looking Glasses allow the operators to check the flap or damped status of their announcements Many oscillating connectivity issues are usually caused by L2 problems Route flap damping will cause connectivity to persist via alternative paths even though primary paths have been restored Quite often the exponential back off of the flap damping timer will give rise to bizarre routing Common symptom is that bizarre routing will often clear away by itself NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Summary Most troubleshooting is about Experience Recognising the common problems Not panicking Logical approach Check configuration first Check locally first before blaming the peer Troubleshoot layer 1 then layer 2 then layer 3 etc NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Summary e Most troubleshooting is about e Using the available tools The debugging tools on the router hardware Internet Looking Glasses Colleagues and their knowledge Public
5. RRCOO Amsterdam w tae oo ee eee RRCOO Amsterdam e RIS Raw Data i i u pele __ RRCO2 SFINX Se oe ee show iP RRco3 AMS IX e Presentations show ip RRCO4 CIXP Miscell C show bcRRC0S VIX B icc A SRRCOS NSPIXP2 e News show ip RRcO7 Netnod Contact Us C show PTN C i a Disclaimer show IP erc12 DE CIX C show ipirrc14 PAIX C show version traceroute C ping Argument Execute Multi Router Looking Glass Written by John Fraizer EnterZone inc About RIPE NCC Site Map LIR Portal About RIPE Contact RIPE NCC All rights reserved gt Be http www ris ripe net cgi bin lg index cgi rrec RRC01 1 amp query 1 amp arg 202 173 147 0 RIS Looking Glass e RIS Home Page e Tools KR RRC Box RRC01 LINX x e Statistics e RIS Raw Data Query C Documentation icy show ip bgp Presentations show ip bgp summary e Miscellaneous C show bgp neighbors e News C show ip bgp regexp Contact Us show ipv6 bgp e C show ipv bgp summary show ipv6 bgp regexp C show version traceroute ping Argument 202 173 147 0 Execute BGP routing table entry for 202 173 144 0 21 Paths 4 available best 3 table Default IP Routing Table Not advertised to any peer 13237 1668 4648 2764 9543 195 66 224 99 from 195 66 224 99 82 197 136 1 Origin IGP localpref 100 valid external Community 1668 31000 13237 44088 135237 46861 Last update Fri Jan 14 01 46 12 2005 Disclaim
6. e RFC says that MED is not always compared As a result the ordering of the paths can effect the decision process e For example the default in Cisco IOS is to compare the prefixes in order of arrival most recent to oldest This can result in inconsistent route selection Symptom is that the best path chosen after each BGP reset is different NZNOG 2006 2006 Cisco Systems Inc All rights reserved Inconsistent Example e Inconsistent route selection may cause problems Routing loops Convergence loops i e the protocol continuously sends updates in an attempt to converge Changes in traffic patterns e Difficult to catch and troubleshoot In Cisco IOS the deterministic med configuration command is used to order paths consistently Enable in all the routers in the AS The bestpath is recalculated as soon as the command is entered NZNOG 2006 2006 Cisco Systems Inc All rights reserved Symptom l Diagram Pesto 10 0 0 0 8 EX e RouterA will have three paths e MEDs from AS 3 will not be compared with MEDs from AS 1 e RouterA will sometimes select the path from R1 as best and but may also select the path from R3 as best NZNOG 2006 2006 Cisco Systems Inc All rights reserved Deterministic MED Operation e The paths are ordered by Neighbour AS e The bestpath for each Neighbour AS group is selected e The overall bestpath results from comparing the winners
7. was down as there was a typo in the shorthand resulting in the incorrect configuration being used NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Tips e Use configuration shorthand both for efficiency and to avoid making policy errors within the iBGP mesh This is especially true for full iBGP mesh networks But be careful of not introducing typos into names of these subroutines common problem e Use route reflectors to avoid accidentally missing IBGP peers especially as the mesh grows in size But stick to the route reflector rules and the defaults in the implementation changing defaults and ignoring BCP techniques introduces complexity and causes problems 2006 Cisco Systems Inc All rights reserved NZNOG 2006 Local Configuration Problems e Peer Establishment e Missing Routes e Inconsistent Route Selection e Loops and Convergence Issues Inconsistent Route Selection e Two common problems with route selection Inconsistency Appearance of an incorrect decision e RFC 1771 defined the decision algorithm e Every vendor has tweaked the algorithm http www cisco com warp public 459 25 shtml e Route selection problems can result from oversights by RFC 1771 e RFC1771 is now obsoleted by RFC4271 Hopefully compliance with RFC4271 will help avoid future issues NZNOG 2006 2006 Cisco Systems Inc All rights reserved Inconsistent Example
8. 04 113 0 24 144 232 9 2 1 00 45 12 701 27358 27358 27358 27358 27358 h 12 104 114 0 24 144 232 9 2 1 00 45 12 701 27358 27358 27356 27356 27356 h 12 108 254 0 24 144 232 9 2 1 00 26 32 701 6389 6197 26829 h 15 130 192 0 20 144 232 9 2 1 00 52 38 701 1273 1889 h 15 195 176 0 20 144 232 9 2 1 00 52 28 701 1273 1889 h 15 197 192 0 18 144 232 9 2 L 00 52 28 701 1273 1889 h 15 198 0 0 17 144 232 9 2 1 00 52 38 701 1273 1889 h 15 203 128 0 18 144 232 9 2 1 00 52 38 701 1273 1889 h 15 204 96 019 144 232 9 2 1 00 52 28 701 1273 1889 h 15 204 128 0 17 144 232 9 2 1 00 52 28 701 1273 1889 h 16 0 0 0 12 144 232 9 2 1 00 52 28 701 1273 1889 1889 h 16 6 0 0 15 144 232 9 2 I 00 52 38 701 1273 1889 h 16 8 0 015 144 232 9 2 1 00 52 38 701 1273 1889 h 16 14 0 0 15 144 232 9 2 1 00 52 38 701 1273 1889 59 81 0 0 18 144 232 9 2 20 05 01 05 701 9800 17773 59 81 64 0 18 144 232 9 2 20 05 01 05 701 9800 17773 59 81 126 0 18 144 232 9 2 20 05 01 05 701 9800 17773 59 81 192 0 18 144 232 9 2 20 05 01 05 701 9800 17773 59 82 0 0 18 144 232 9 2 20 05 01 05 701 9500 17773 59 82 64 0 18 144 232 9 2 20 05 01 05 701 9800 17773 59 82 128 0 18 144 232 9 2 20 05 01 05 701 9800 17773 9 82 192 0 18 144 232 9 2 20 05 01 05 701 9800 17773 59 83 0 0 18 144 232 9 2 20 05 01 05 701 9800 17773 59 83 64 0 18 144 232 9 2 20 05 01 05 701 9800 17773 59 83 128 0 18 144 232 9 2 20 05 01 05 701 9800 17773 59 83 192 0 18 144 232 9 2 20 05 01 05 701 9800 17773 gt 61 1
9. 2 s network blocks then is AS2 announcing the prefix to its upstreams If they claim they are ask them to ask their upstream for their BGP table or use a Looking Glass to check NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example III A light flow of traffic from AS2 but 50 less than from AS3 Looking Glass comes to the rescue LG will let you see what AS2 or AS2 s upstreams are announcing AS1 may choose this as primary path but AS2 relationship with their upstream may decide otherwise NetFlow comes to the rescue Allows AS1 to see what the origins are and with the LG helps AS1 to find where the prefix filtering culprit might be NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example IV e Symptom AS1 is loadsharing between its upstreams but the traffic load swings randomly between AS2 and AS3 NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example IV e Checklist Assume AS1 has done everything in this tutorial so far All the configurations look fine the Looking Glass outputs look fine life is wonderful Apart from those annoying traffic swings every hour or so L2 problem Route Flap Damping Since BGP is configured fine and the net has been stable for so long can only be an L2 problem or Route Flap Damping side effect NZNOG 2006
10. Cisco SYSTEMS Troubleshooting BGP Philip Smith lt pfs cisco com gt NZNOG 2006 22 24 Mar 2006 Wellington Presentation Slides e Slides are at ftp ftp eng cisco com pfs seminars NZNOG2006 BGP part4 pdf And on the NZNOG 2006 website e Feel free to ask questions any time NZNOG 2006 2006 Cisco Systems Inc All rights reserved Assumptions e Presentation assumes working knowledge of BGP Beginner and Intermediate experience of protocol e If in any doubt please ask NZNOG 2006 2006 Cisco Systems Inc All rights reserved e Fundamentals of Troubleshooting e Local Configuration Problems e Internet Reachability Problems NZNOG 2006 2006 Cisco Systems Inc All rights reserved Fundamentals Problem Areas e First step is to recognise what causes the problem e Possible Problem Areas Misconfiguration Configuration errors caused by bad documentation misunderstanding of concepts poor communication between colleagues or departments Human error Typos using wrong commands accidents poorly planned maintenance activities NZNOG 2006 2006 Cisco Systems Inc All rights reserved Fundamentals Problem Areas e More Possible Problem Areas feature behaviour Or it used to do this with Release X Y a but Release X Y b does that Interoperability issues Differences in interpretation of RFC1771 and its developments Those be
11. HOPs are known by the IGP whether OSPF ISIS static or connected routes If NEXT HOP is also in iBGP ensure the iBGP distance is longer than the IGP distance e Don t carry external NEXT_HOPs in your network Use next hop self concept on all the edge BGP routers e Two simple solutions NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Tips e High CPU utilisation in the BGP process is normally a sign of a convergence problem e Find a prefix that changes every minute e Troubleshoot debug that one prefix Troubleshooting Tips e BGP routing loop First check for IGP routing loops to the BGP NEXT_HOPs e BGP loops are normally caused by Not following physical topology in RR environment Multipath with confederations Lack of a full iBGP mesh e Get the following from each router in the loop path The routing table entry The BGP table entry The route to the NEXT_HOP NZNOG 2006 2006 Cisco Systems Inc All rights reserved Convergence Problems Example Route reflector with 250 route reflector clients e 100k routes e BGP will not converge e Logs show that neighbour hold times have expired e The BGP router summary shows peers establishing dropping re establishing And it s not the MTU problem we saw earlier NZNOG 2006 2006 Cisco Systems Inc All rights reserved Convergence Problems Example e We are either missing hellos or o
12. Rs e R3 and R4 are RRCs e R4 is advertising 7 0 0 0 8 R2 has it R1 and R3 do not Missing Routes Example Il e R1 is not accepting the route when R2 sends it on If R1 sees its own router ID in the cluster ID attribute in any received prefix it will reject that prefix How a route reflector avoids redundant information e Reason Early documentation claimed that RRC redundancy should be achieved by dual route reflectors in the same cluster This is fine and good but then ALL clients must peer with both RRs otherwise examples like this will occur e Solution Use overlapping RRCs for redundancy stick to defaults NZNOG 2006 2006 Cisco Systems Inc All rights reserved Missing Routes e Route Origination e UPDATE Exchange e Filtering e IBGP mesh problems Update Filtering e Type of filters Prefix filters AS_PATH filters Community filters Policy Attribute manipulation e Applied incoming and or outgoing NZNOG 2006 2006 Cisco Systems Inc All rights reserved Update Filtering e If you suspect a filtering problem become familiar with the router tools to find out what BGP filters are applied e Tip don t cut and paste Many filtering errors and diagnosis problems result from cut and paste buffer problems on the client the connection and even the router NZNOG 2006 2006 Cisco Systems Inc All rights reserved Update Filtering Common Prob
13. er 286 209 1239 4648 2764 9543 195 66 224 54 from 195 66 224 54 134 222 86 174 Origin IGP localpref 100 valid external Last update Wed Jan 5 13 52 52 2005 5511 10026 4648 2764 9543 195 66 224 83 from 195 66 224 83 193 251 245 1 Origin IGP localpref 100 valid external best Last update Mon Jan 17 02 15 07 2005 8342 702 701 1239 4645 2764 9543 195 66 224 90 from 195 66 224 90 195 161 1 152 Origin IGP localpref 100 valid external Last update Wed Dec 29 00 13 04 2004 Muiti Router Looking Glass Troubleshooting Connectivity Example II Hmmm e Looking Glass can see 202 173 144 0 21 This includes 202 173 147 0 24 So the problem must be with AS3 or AS3 s upstream e A traceroute confirms the connectivity ie O Ripe http www ris ripe net cai binjlafindex cgi rrc RRCOl 1 amp query 7earg 202 173 147 216 e Tools Ta RRC Box RRC01 LINX x e Statistics s RIS Raw Data Query e Documentation Cc show ip bgp Presentations C show ip bgp summary e Miscellaneous C show bgp neighbors e News C show ip bgp regexp s Contact Us C show ipv bgp o Disdaimer C show ipv bgp summary C show ipv6 bgp regexp show version traceroute C ping Argument 202 173 147 216 Execute Traceroute fram RRCO1 to 202 173 147 216 traceroute to 202 173 147 216 202 173 147 216 30 hops max 36 byte packets l collector linx net 195 66 225 254 0 752 ms 0 487 ms 0 567 ms 2 fa2 1 112 transitl t
14. from each group e The bestpath will be consistent because paths will be placed in a deterministic order Solution Diagram 10 0 0 0 8 a a RouterA sd NZNOG 2006 2006 Cisco Systems Inc All rights reserved Inconsistent Example Il e The bestpath changes every time the peering is reset e By default the oldest external is the bestpath All other attributes are the same Stability Enhancement in Cisco IOS e The BGP sub command bestpath compare router id will disable this enhancement NZNOG 2006 2006 Cisco Systems Inc All rights reserved Inconsistent Example Ill Path 1 has higher localpref but path 2 is better e This appears to be incorrect e It s because Cisco IOS has synchronization on by default and if a prefix is not synchronized i e appearing in IGP as well as BGP its path won t be included in the bestpath process NZNOG 2006 2006 Cisco Systems Inc All rights reserved Inconsistent Path Selection e Summary RFC1771 wasn t prefect when it came to path selection years of operational experience have shown this Vendors and ISPs have worked to put in stability enhancements now reflected in RFC4271 But these can lead to interesting problems And of course some defaults linger much longer than they ought to so never assume that an out of the box default configuration will be perfect for
15. ge this to permit multiple hops e Some ISPs won t even allow their customers to use eBGP multihop due to the potential for problems NZNOG 2006 2006 Cisco Systems Inc All rights reserved Peer Establishment eBGP Problems e eBGP multihop problems IP Connectivity to the remote address is a route in the local routing table is a route in the remote routing table Check this using ping including the extended options that it has in most implementations e Filters in the path If this crosses multiple providers this needs their cooperation NZNOG 2006 2006 Cisco Systems Inc All rights reserved Peer Establishment Passwords e Using passwords on iBGP and eBGP sessions Link won t come up Been through all the previous troubleshooting steps _ Common problems Missing password needs to be on both ends Cut and paste errors don t Typographical errors Capitalisation extra characters white space e Common solutions Check for symptoms messages in the logs Re enter passwords from scratch don t cut amp paste NZNOG 2006 2006 Cisco Systems Inc All rights reserved Flapping Peer Common Symptoms e Symptoms the eBGP session flaps e eBGP peering establishes then drops re establishes then drops NZNOG 2006 2006 Cisco Systems Inc All rights reserved Flapping Peer Common Symptoms e Ensure logging is enabled no logs no clue
16. hn linx net 195 66 248 226 0 641 ms 0 778 ms 0 745 ms 3 demon transit thn linx net 195 66 248 26 0 654 ms 0 643 ms 0 518 ms 4 tele border 2 gl 0 0 router demon net 194 70 98 162 0 981 ms 1 082 ms 1 212 ms 5 s3l gw22 lon 2 2 sprintlink net 213 206 156 49 0 945 ms 1 105 ms 0 946 ms 6 sl bb2l lon 9 0 sprintlink net 213 206 128 938 1 117 ms 0 933 ms 1 030 ms 7 sl bb2l tuk 10 0 sprintlink net 144 232 19 69 73 652 ms 73 803 ms 73 570 ms 8 sl bb20 tuk 15 0 sprintlink net 144 232 20 132 682 147 ms 61 515 ms 73 875 ms 9 sl bb2l rly 14 0 sprintlink net 144 232 20 115 61 549 ms 61 799 ms 61 536 ms 10 sl bb22 rly 13 0 sprintlink net 144 232 7 254 61 302 ms 61 898 ms 61 616 ns 11 sl bb22 5j 10 0 sprintlink net 144 232 20 186 143 283 ms 143 680 ms 143 041 ms 12 144 232 20 47 144 232 20 47 164 656 ms 145 663 ms 148 485 ms 13 sl newzeal 1 0 sprintlink net 144 223 243 138 151 380 ms 151 648 ms 151 394 ms 14 p5 l sjbrl global gateway net nz 202 37 245 229 306 191 ms 307 392 ms 305 750 ms 15 pl 5 sybr3 global gateway net nz 202 37 247 81 306 225 ms 306 216 ms 306 239 ms 16 con2 sybr3 global gateway net nz 202 537 246 242 306 370 ms 307 952 ms 306 693 ms 17 30 3 0 3 crel syd connect com au 202 10 4 11 308 144 ms 306 429 ms 307 262 ms 18 30 3 0 2 crel hay connect com au 202 10 0 63 306 027 ms 306 267 ms 307 442 ms 13 30 1 1 0 crel for connect com au 202 10 0 34 322 587 ms 327 149 ms 325 830 ms 20 so 0 0 1 dstZ2 bri connect com au
17. ights reserved Troubleshooting Connectivity Example II e Troubleshooting across the Internet is harder But tools are available e Looking Glasses offering traceroute ping and BGP status are available all over the globe Most connectivity problems seem to be found at the edge of the network rarely in the transit core Problems with the transit core are usually intermittent and short term in nature NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example Ill e Symptom AS1 is trying to loadshare between its upstreams but has trouble getting traffic through the AS2 link NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example III e Checklist What does trouble mean e Is outbound traffic loadsharing okay Can usually fix this with selectively rejecting prefixes and using local preference Generally easy to fix local problem simple application of policy e Is inbound traffic loadsharing okay Errummm bigger problem if not Need to do some troubleshooting if configuration with communities AS PATH prepends MEDs and selective leaking of subprefixes don t seem to help NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example Ill e Checklist AS1 announces but does AS2 see it We are checking eBGP filters on R1 and R2 Remember that R2 access will re
18. king eBGP filters on R1 and R2 Remember that R2 access will require cooperation and assistance from your peer Does AS2 see it over entire network We are checking iBGP across AS2 s network unneeded step in this case but usually the next consideration Quite often iBGP is misconfigured lack of full mesh problems with RRs etc NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example l e Checklist Does AS2 send it to AS3 We are checking eBGP configuration on R2 There may be a configuration error with as path filters or prefix lists or communities such that only local prefixes get out Does AS3 see all of AS2 s originated prefixes We are checking eBGP configuration on R3 Maybe AS3 does not know to expect prefixes from AS1 in the peering with AS2 or maybe it has similar errors in as path or prefix or community filters NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example I e Troubleshooting connectivity beyond immediate peers is much harder Relies on your peer to assist you they have the relationship with their BGP peers not you Quite often connectivity problems are due to the private business relationship between the two neighbouring ASNs NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example Il ASi 202 173 147 0 aw nie D e Sym
19. lems e Typos in regular expressions Extra characters missing characters white space etc In regular expressions every character matters so accuracy is highly important e Typos in prefix filters Watch the router CLI and the filter logic it may not be as obvious as you think or as simple as the manual makes out Watch netmask confusion and 255 profusion easy to muddle 255 with 0 and 225 NZNOG 2006 2006 Cisco Systems Inc All rights reserved Update Filtering Common Problems e Communities Each implementation has different defaults for when communities are sent Some don t send communities by default Others do for iBGP and not for eBGP by default Others do for all BGP peers by default Watch how your implementation handles communities There may be implicit filtering rules Each ISP has different policies never assume that because communities exist that people will use them or pay attention to the ones you send NZNOG 2006 2006 Cisco Systems Inc All rights reserved Missing Routes General Problems e Make and then Stick to simple policy rules Most implementations have particular rules for filtering of prefixes AS paths and for manipulating BGP attributes Try not to mix these rules Rules for manipulating attributes can also be used for filtering prefixes and ASNs can be very powerful but can also become very confusing NZNOG 2006 2006 Cisco Systems Inc A
20. ll rights reserved Missing Routes e Route Origination e UPDATE Exchange e Filtering e iBGP mesh problems Missing Routes IBGP e Symptom customer complains about patchy Internet access Can access some but not all sites connected to backbone Can access some but not all of the Internet NZNOG 2006 2006 Cisco Systems Inc All rights reserved Missing Routes e Customer connected to R1 can see AS3 but not AS2 e Also complains about not being able to see sites connected to R5 e No complaints from other customers Missing Routes IBGP e Diagnosis This is the classic iBGP mesh problem The full mesh isn t complete how do we know this e Customer is connected to R1 Can t see AS2 R3 is somehow not passing routing information about AS2 to R1 Can t see R5 R5 is somehow not passing routing information about sites connected to R5 But can see rest of the Internet his prefix is being announced to some places so not an iBGP origination problem NZNOG 2006 2006 Cisco Systems Inc All rights reserved Missing Routes IBGP e When using full mesh iBGP check on every iBGP speaker that it has a neighbour relationship with every other iBGP speaker In this example R3 peering with R1 is down as R1 isn t seeing any of the routes connected through R3 e Try and use configuration shorthand if available in your implementation Peering between R1 and R5
21. mailing lists where appropriate NZNOG 2006 2006 Cisco Systems Inc All rights reserved Closing Comments e Presentation has covered the most common troubleshooting techniques used by ISPs today e Once these have been mastered more complex or arcane problems are easier to solve e Feedback and input for future improvements is encouraged and very welcome NZNOG 2006 2006 Cisco Systems Inc All rights reserve d Cisco SYSTEMS Troubleshooting BGP Philip Smith lt pfs cisco com gt NZNOG 2006 22 24 Mar 2006 Wellington
22. must exist in another routing process too typically Static route pointing to customer for customer routes into your iBGP Static route pointing to Null for aggregates you want to put into your eBGP NZNOG 2006 2006 Cisco Systems Inc All rights reserved Missing Routes e Route Origination e UPDATE Exchange e Filtering e IBGP mesh problems Missing Routes Update Exchange Ah Route Reflectors Such a nice solution to help scale BGP But why do people insist in breaking the rules all the time e Common issues Clashing router IDs Clashing cluster IDs NZNOG 2006 2006 Cisco Systems Inc All rights reserved Missing Routes Example Two RR clusters e R1 is a RR for R3 e R2 is a RR for R4 e R4 is advertising 7 0 0 0 8 R2 has the route but R1 and R3 do not Missing Routes Example e R1 is not accepting the route when R2 sends it on Clashing router ID If R1 sees its own router ID in the originator attribute in any received prefix it will reject that prefix How a route reflector attempts to avoid routing loops e Solution do NOT set the router ID by hand unless you have a very good reason to do so and have a very good plan for deployment Router ID is usually calculated automatically by router NZNOG 2006 2006 Cisco Systems Inc All rights reserved 35 Missing Routes Example Il e One RR cluster e R1 and R2 are R
23. outers tools Is logging of the BGP process enabled And is it captured recorded off the router Are you familiar with the BGP debug process and commands if available Check vendor documentation before switching on full BGP debugging you might get fewer surprises NZNOG 2006 2006 Cisco Systems Inc All rights reserved Fundamentals Tools e Traffic and traffic flow measurement in the network Unexplained change in traffic levels on an interface a connection a peering Correlation of customer feedback on network or connectivity issues NZNOG 2006 2006 Cisco Systems Inc All rights reserved Agenda e Fundamentals e Local Configuration Problems e Internet Reachability Problems NZNOG 2006 Local Configuration Problems Peer Establishment e Missing Routes e Inconsistent Route Selection e Loops and Convergence Issues Peer Establishment ACLs and Connectivity e Routers establish a TCP session Port 179 Permit in interface packet filters IP connectivity route from IGP e OPEN messages are exchanged Peering addresses must match the TCP session Local AS configuration parameters NZNOG 2006 2006 Cisco Systems Inc All rights reserved Peer Establishment Common Problems e Sessions are not established No IP reachability Incorrect configuration e Peers are flapping Layer 2 problems NZNOG 2006 2006 Cisco Systems Inc All righ
24. pered because the relationship between routing information flow and traffic flow is forgotten NZNOG 2006 2006 Cisco Systems Inc All rights reserved Internet Reachability Problems BGP Path Selection Process e Each vendor has tweaked the path selection process Know it for your router equipment saves time later Especially applies with networks with more than one BGP implementation present Best policy is to use supplied knobs to ensure consistency and avoid steps in the process which can lead to inconsistency NZNOG 2006 2006 Cisco Systems Inc All rights reserved Internet Reachability Problems MED Confusion e Default MED on Cisco IOS is ZERO It may not be this on your router or your peer s router e Best not to rely on MEDs for multihoming on multiple links to upstream Their default might be 2 2 1 resulting in your hoped for best path being their worst path Workaround i e current good practice is to use communities rather than MEDs NZNOG 2006 2006 Cisco Systems Inc All rights reserved Internet Reachability Problems e Community confusion set community does just that it overwrites any other community set on the prefix Use additive keyword to add community to existing list Use Internet format for community AS xx not the 32 bit IETF format Cisco IOS never sends community by default Other implementations may send community by defaul
25. ptom AS1 announces 202 173 147 0 24 to its upstreams but AS3 cannot see the network NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example Il e Checklist AS1 announces but do its upstreams see it We are checking eBGP filters on R1 and upstreams Remember that upstreams will need to be able to help you with this Is the prefix visible anywhere on the Internet We are checking if the upstreams are announcing the network to anywhere on the Internet See next slides on how to do this NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example II e Help is at hand the Looking Glass e Many networks around the globe run Looking Glasses These let you see the BGP table and often run simple ping or traceroutes from their sites www traceroute org for IPv4 Some IPv6 Looking Glasses listed at www bgp4 as looking glasses e Some ISPs especially those with large and diverse networks run their own internal Looking Glass to aid internal troubleshooting e Next slides have some examples of a typical looking glass in action NZNOG 2006 2006 Cisco Systems Inc All rights reserved Ripe http www ris ripe net cgi bin lgfindex cgi CLIR Portal About RIPE NCC Contact Search Sitemap Routing Information Service RIPE NCC Homepage gt RIS RIS RIS Looking Glass e RIS Home Page e Tools RRC Box
26. quire cooperation and assistance from your peer Does AS2 see it over entire network We are checking iBGP across AS2 s network Quite often iBGP is misconfigured lack of full mesh problems with RRs etc NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example III e Checklist Does AS2 send it to its upstream We are checking eBGP configuration on R2 There may be a configuration error with as path filters or prefix lists or communities such that only local prefixes get out Does the Internet see all of AS2 s originated prefixes We are checking eBGP configuration on other Internet routers This means using looking glasses And trying to find one as close to AS2 as possible NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example Ill e Checklist Repeat all of the above for AS3 e Stopping here and resorting to a huge prepend towards AS3 won t solve the problem e There are many common problems listed on next slide And tools to help decipher the problem NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example III e No inbound traffic from AS2 AS2 is not seeing AS1 s prefix or is blocking it in inbound filters A trickle of inbound traffic Switch on NetFlow if the router has it and check the origin of the traffic If it is just from AS
27. s PPP problems satellite or radio problems weather etc The list is endless your L2 folks should know how to solve them For you ping is the tool to use NZNOG 2006 2006 Cisco Systems Inc All rights reserved Flapping Peer Fixed gt Small Packets B Large Packets e Large packets are ok now e BGP session is stable NZNOG 2006 2006 Cisco Systems Inc All rights reserved Local Configuration Problems e Peer Establishment e Missing Routes e Inconsistent Route Selection e Loops and Convergence Issues Quick Review e Once the session has been established UPDATES are exchanged All the locally known routes Only the bestpath is advertised e Incremental UPDATE messages are exchanged afterwards NZNOG 2006 2006 Cisco Systems Inc All rights reserved Quick Review e Bestpath received from eBGP peer Advertise to all peers e Bestpath received from iBGP peer Advertise only to eBGP peers A full iBGP mesh must exist NZNOG 2006 2006 Cisco Systems Inc All rights reserved Missing Routes e Route Origination e UPDATE Exchange e Filtering e iBGP mesh problems Missing Routes Route Origination e Common problem occurs when putting prefixes into the BGP table e BGP table is NOT the RIB BGP table as with OSPF table ISIS table static routes etc is used to feed the RIB and hence the FIB e To get a prefix into BGP it
28. t for iBGP and or eBGP Never assume that your neighbouring AS will honour your no export community ask first NZNOG 2006 2006 Cisco Systems Inc All rights reserved Internet Reachability Problems e AS PATH prepends 20 prepends won t lessen the priority of your path any more than 10 prepends will check it out at a Looking Glass The Internet is on average only 5 ASes deep maximum AS prepend most ISPs have to use is around this too Know you BGP path selection algorithm Some ISPs use bgp maxas limit 15 to drop prefixes with ridiculously long AS paths NZNOG 2006 2006 Cisco Systems Inc All rights reserved Internet Reachability Problems e Private ASes should not ever appear in the Internet e Cisco IOS remove private AS command does not remove every instance of a private AS e g won t remove private AS appearing in the middle of a path surrounded by public ASNs www cisco com warp public 459 32 html e Apparent non removal of private ASNs may not be a bug but a configuration error somewhere else NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example ASi 192 168 1 0 24 e Symptom AS1 announces 192 168 1 0 24 to AS2 but AS3 cannot see the network NZNOG 2006 2006 Cisco Systems Inc All rights reserved Troubleshooting Connectivity Example I e Checklist AS1 announces but does AS2 see it We are chec
29. ts reserved Peer Establishment Is the Local AS configured correctly Is the remote as assigned correctly Verify with your diagram or other documentation NZNOG 2006 Peer Establishment IBGP Summary e Assume that IP connectivity has been checked Including IGP reachability between peers e Check TCP to find out what connections we are accepting Check the ports and source destination addresses Do they match the configuration Common problem iBGP is run between loopback interfaces on router for stability but the configuration is missing from the router iBGP fails to establish Remember that source address is the IP address of the outgoing interface unless otherwise specified NZNOG 2006 2006 Cisco Systems Inc All rights reserved Peer Establishment eBGP Problems e eBGP by and large is problem free for single point to point links Source address is that of the outbound interface Destination address is that of the outbound interface on the remote router And is directly connected TTL is set to 1 for eBGP peers Filters permit TCP 179 in both directions NZNOG 2006 2006 Cisco Systems Inc All rights reserved Peer Establishment eBGP Problems e Load balancing over multiple links and or use of __eBGP multihop gives potential for so many problems _ IP Connectivity to the remote address Filters somewhere in the path eBGP by default sets TTL to 1 so you need to chan
30. ur peers are not sending them e Check for interface input drops If the number is large and the interface counters show recent history then this is probably the cause of the peers going down e Large drops is usually due to the input queue being too small Large numbers of peers can easily overflow the queue resulting in lost hellos e Solution is to increase the size of the input queues to be considerably larger than the number of peers NZNOG 2006 2006 Cisco Systems Inc All rights reserved Convergence Problems Example Il o BGP converges in 25 minutes for 250 peers and 100k routes Seems like a long time What is TCP doing e Check the MSS size And enable Path MTU discovery on the router if it is not on by default MSS of 536 means that router needs to send almost three times the amount of packets compared with an MSS of 1460 e Result Should see BGP converging in about half the time which is respectable for 250 peers and 100k routes NZNOG 2006 2006 Cisco Systems Inc All rights reserved Agenda e Fundamentals e Local Configuration Problems e Internet Reachability Problems NZNOG 2006 Internet Reachability Problems e BGP Attribute Confusion To Control Traffic in Send MEDs and AS PATH prepends on outbound announcements To Control Traffic out Attach local preference to inbound announcements e Troubleshooting of multihoming and transit is often ham
31. yond your control Upstream ISP or peers make a change which has an unforeseen impact on your network NZNOG 2006 2006 Cisco Systems Inc All rights reserved Fundamentals Working on Solutions e Next step is to try and fix the problem And this is not about diving into network and trying random commands on random routers just to see what difference this makes e Before we begin Troubleshooting is about Not panicking Creating a checklist Working to that checklist Starting at the bottom and working up NZNOG 2006 2006 Cisco Systems Inc All rights reserved Fundamentals Checklists e This presentation will have references in the later stages to checklists They are the best way to work to a solution They are what many NOC staff follow when diagnosing and solving network problems It may seem daft to start with simple tests when the problem looks complex But quite often the apparently complex can be solved quite easily NZNOG 2006 2006 Cisco Systems Inc All rights reserved Fundamentals Tools e Use system and network logs as an aid e Record keeping Good and detailed system logs Last known good configuration History trail of working configurations and all intermediate changes Record of commands entered on routers and other network devices NZNOG 2006 2006 Cisco Systems Inc All rights reserved Fundamentals Tools e Familiarise yourself with the r
32. your network NZNOG 2006 2006 Cisco Systems Inc All rights reserved Local Configuration Problems e Peer Establishment e Missing Routes e Inconsistent Route Selection e Loops and Convergence Issues Route Oscillation Symptom e One of the most common problems e Main symptom is that traffic exiting the network oscillates every minute between two exit points This is almost always caused by the BGP NEXT_HOP being known only by BGP Common problem in ISP networks but if you have never seen it before it can be a nightmare to debug and fix Other symptom is high CPU utilisation for the BGP router process NZNOG 2006 2006 Cisco Systems Inc All rights reserved Route Oscillation Diagram e R3 prefers routes via AS 4 one minute e 1 minute later R3 prefers routes via AS 12 e And 1 minute after that R3 prefers AS 4 again NZNOG 2006 2006 Cisco Systems Inc All rights reserved Route Oscillation Cause e BGP nexthop is known via BGP This is an illegal recursive lookup e Scanner will notice drop this path and install the other path in the RIB e Route to the nexthop is now valid e Scanner will detect this and re install the other path e Routes will oscillate forever One minute cycle in Cisco IOS as scanner runs every minute NZNOG 2006 2006 Cisco Systems Inc All rights reserved Route Oscillation Solution e Make sure that all the BGP NEXT_

Download Pdf Manuals

image

Related Search

Related Contents

Stellar SR55 Soft Starter User Manual SR55-M  Brita WFUSS-120 Use and Care Manual  トーハツ船外機 取扱説明書  Quick Use Guide  dreamGEAR ISOUND-4743 screen protector    Supermicro PDSLE motherboard    Clique aqui para baixar o arquivo PDF  

Copyright © All rights reserved.
Failed to retrieve file