Сегодня ночью обнаружилось, что один из апстрим каналов начал играть в «ванька-встанька».
BGP сессия падала, поднималась, снова падала, снова поднималась и так до бесконечности.
В логах BGP читалось:
Sep 9 00:26:59.219845 RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer XX.XX.XX.141 (External AS XXXX) changed state from Established to Idle (event RecvUpdate)
Sep 9 00:27:03.460183 bgp_read_v4_update:8189: NOTIFICATION sent to XX.XX.XX.141 (External AS XXXX): code 3 (Update Message Error) subcode 11 (AS path attribute problem)
Sep 9 00:27:39.219859 RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer XX.XX.XX.141 (External AS XXXX) changed state from OpenConfirm to Established (event RecvKeepAlive)
В messages:
Sep 9 00:27:33 rpd[1153]: XX.XX.XX.141 (External AS XXXX) Received BAD update for family inet-unicast(1), prefix 212.118.142.0/24
Полез разбираться, читаем:
BGP Notification Message Error Codes and Error Subcodes
Error Code Value |
Code Name | Description |
1 | Message Header Error |
A problem was detected either with the contents or length of the BGP header. The Error Subcode provides more details on the nature of the problem. |
2 | Open Message Error |
A problem was found in the body of an Open message. The Error Subtype field describes the problem in more detail. Note that authentication failures or inability to agree on a parameter such as hold time are included here. |
3 | Update Message Error |
A problem was found in the body of an Update message. Again, the Error Subtype provides more information. Many of the problems that fall under this code are related to issues detected in the routing data or path attributes sent in the Update message, so these messages provide feedback about such problems to the device sending the erroneous data. |
4 | Hold Timer Expired |
A message was not received before the hold time expired. See the description of the Keepalive message for details on this timer. |
5 | Finite State Machine Error |
The BGP finite state machine refers to the mechanism by which the BGP software on a peer moves from one operating state to another based on events (see the TCP finite state machine description for some background on this concept). If an event occurs that is unexpected for the state the peer is currently in, it will generate this error. |
6 | Cease | Used when a BGP device wants to break the connection to a peer for a reason not related to any of the error conditions described by the other codes. |
Error Type |
Error Subcode Value |
Subcode Name |
Description |
Message Header Error (Error Code 1) |
1 | Connection Not Synchronized |
The expected value in the Marker field was not found, indicating that the connection has become unsynchronized. See the description of the Marker field. |
2 | Bad Message Length |
The message was less than 19 bytes, greater than 4096 bytes, or not consistent with what was expected for the message type. |
|
3 | Bad Message Type |
The Type field of the message contains an invalid value. |
|
Open Message Error (Error Code 2) |
1 | Unsupported Version Number |
The device does not “speak” the version number its peer is trying to use. |
2 | Bad Peer AS | The router doesn’t recognize the peer’s autonomous system number or is not willing to communicate with it. |
|
3 | Bad BGP Identifier |
The BGP Identifier field is invalid. |
|
4 | Unsupported Optional Parameter |
The Open message contains an optional parameter that the recipient of the message doesn’t understand. |
|
5 | Authentication Failure |
The data in the Authentication Information optional parameter could not be authenticated. |
|
6 | Unacceptable Hold Time |
The router refuses to open a session because the proposed hold time its peer specified in its Open message is unacceptable. |
|
Update Message Error (Error Code 3) |
1 | Malformed Attribute List |
The overall structure of the message’s path attributes is incorrect, or an attribute has appeared twice. |
2 | Unrecognized Well-Known Attribute |
One of the mandatory well-known attributes was not recognized. |
|
3 | Missing Well-Known Attribute |
One of the mandatory well-known attributes was not specified. |
|
4 | Attribute Flags Error |
An attribute has a flag set to a value that conflicts with the attribute’s type code. |
|
5 | Attribute Length Error |
The length of an attribute is incorrect. |
|
6 | Invalid Origin Attribute |
The Origin attribute has an undefined value. |
|
7 | AS Routing Loop |
A routing loop was detected. |
|
8 | Invalid Next_Hop Attribute |
The Next_Hop attribute is invalid. |
|
9 | Optional Attribute Error |
An error was detected in an optional attribute. |
|
10 | Invalid Network Field |
The Network Layer Reachability Information field is incorrect. |
|
11 | Malformed AS_Path |
The AS_Path attribute is incorrect. |
Такс… приехали…
При помощи своего апстрима удалось выяснить что действительно мне и многим другим другим поднасрал Saudi Telecom и анонсируемый им префикс 212.118.142.0/24.
О чем свидетельствуют мои логи, рассказ апстрима о проблемах и у других его клиентов, а так же: Saudi Telecom sending route with invalid attributes 212.118.142.0/24:
anyone else getting a route for 212.118.142.0/24 with invalid
attributes? Seems this is (again) causing problems with some (older)
routers/software.Announcement bits (4): 0-KRT 3-KRT 5-Resolve tree 1
6-Resolve tree 2
AS path: 6453 39386 25019 I Unrecognized Attributes: 39
bytes
AS path: Attr flags e0 code 80: 00 00 fd 88 40 01 01 02
40 02 04 02 01 5b a0 c0 11 04 02 01 fc da 80 04 04 00 00 00 01 40 05 04
00 00 00 64
Accepted Multipath-Jonas
Ответ:
Exactly the same here.
Sep 8 20:24:04 BBD-RC02 rpd[1334]: Received BAD update from
94.228.128.57 (External AS 41887), aspath_attr():3472
PA4_TYPE_ATTRSET(128) => 1 times IGNORED, family inet-unicast(1), prefix
212.118.142.0/24Bye,
Raymond.
Смотрим инфу в RIPE:
% Information related to '212.118.140.0/22AS25019'
route: 212.118.140.0/22
descr: Saudi Arabia backbone and local registry address space / STC
remarks: for any Abuse or Spamming Please send an e-mail to abuse@saudi.net.sa
origin: AS25019
mnt-by: saudinet-stc
source: RIPE # Filtered
% Information related to '212.118.140.0/22AS39891'
route: 212.118.140.0/22
descr: Saudi Arabia backbone and local registry address space / STC
origin: AS39891
mnt-by: saudinet-stc
source: RIPE # Filtered
О как, за двумя ASками данный префикс числится. Красавцы, чего тут сказать.
Отфильтровав этот префикс к чертям собачьим, BGP сессия сразу же перестала падать. Вечером та же участь постигла и другого апстрима.
Т.к. к вечеру я уже реально задолбался, то взял отфильтровал нафиг по as-path обе аски AS39891 и AS25019. Пусть саудовский телеком идет лесом и уже научится пользоваться BGP фильтрами.
Вот так один «олень» может доставить гемороя многим во всем мире.
З.Ы. Junos:
- JUNOS Base OS Software Suite [9.4R1.8]
JUNOS Kernel Software Suite [9.4R1.8]
Пролеме точно подвержены Junos`ы версии 9.4 и ниже, update JunOS`а ждет меня и не только меня 🙂