What is BGP

Border Gateway Protocol (BGP) is regards as the most influential network protocols as it is backbone of the internet today. BGP is a Path Vector Routing Protocol, that unlike other routing protocols uses TCP (port 179, as its transport layer) to establish connectivity before exchanging routing information with another BGP speaker (peer). BGP communication can be done between same and/or different networks, these networks are known as Autonomous Systems (AS) with an AS being a set of Routers that are managed by single entity, business and/or company. BGP uses routing information to maintain a BGP Routing Information Base (RIB) of Network Layer Reachability Information (NLRI) which it will exchange with other BGP peer or Peer ASs. BGP is a classless protocol, it can support any IP prefix regardless of class, this is both for IPv4 and IPv6. It is important to note that requires TCP connection first before building BGP connection, without that first established session a BGP peering never happen, however once that session is connected it will not have to made again unless a change is made. BGP uses Keepalive messages to ensure reliability of the session as it does not use any transport protocol-based keep-alive mechanism to determine if peers are reachable.

BGP Usage

BGP is largely (but not exclusively) used in large enterprises and data centre hosting environments where the need for single or multihomed to multiple Internet Service Providers (ISPs) connections are needed, this is known as Exterior BGP (eBGP). BGP is extensively used with Service Provider environments. BGP allows a large range of the policy based controls for an AS to influence and/or manipulate routed inbound and outbound traffic to help optimise the movement of traffic for their own needs. Additionally BGP can be used between BGP routers within the same AS to advertise internal routes with the same level of control as eBGP, with some small however important difference, this is known as Interior BGP (iBGP).

eBGP vs iBGP

There are some Key Differences between eBGP and iBGP that are important to note:

eBGP

  • eBGP session is between BGP peers with different AS numbers
  • Inter-AS communication is by via eBGP
  • eBGP respects the AS_Path Path Attribute
  • Routes learnt via eBGP will be advertised to other eBGP and iBGP peers

iBGP

  • iBGP session is between BGP peers with the same AS number
  • Intra-AS communication can be by via iBGP
  • iBGP commonly uses an IGP for network reachability and to establish BGP TCP session via Loopback address
  • Routes learnt via iBGP will not be advertised to other iBGP peers however will advertise routes to an eBGP peer

The above isn’t the full differences but just some of the main difference that need to remember. Additionally there are situations where some of these rules may need to be manipulated and can be done in design and/or configuration however that is for later

BGP Peering States

When establishing a BGP session there are 6 states that need to be completed before peering session comes up. The first 3 states are to ensure the TCP transport layer connectivity is there, once this has been completed then BGP connectivity is established with the final 3 states:

BGP State Connect Description
Idle TCP This is when all BGP connections will be refused. An Idle state occurs when the BGP session hasn’t been configured on the other BGP peer or BGP has isn’t enabled at all. Commonly, a start event is required from the other peer to prepare the TCP connectivity.
Connect TCP The router listening for TCP connections and is waiting for the TCP 3-way handshake to be completed: (a) If this is completed, then an Open message is sent and is transitioned into the OpenSent State. (b) If TCP connections fails, the BGP peer restarts the ConnectRetryTimer and waits for the remote peer to initiate a TCP connection and transitions into an Active State.
Active TCP This is when the BGP peer is trying to establish TCP connection.
OpenSent BGP When in the OpenSent, an open message has been sent by the BGP peer however has not received by the local peer: (a) Once the message has been received, checked and has no errors, the local peer will send a Keepalive message. (b) If a message is received, checked and an error is found then state is transitioned back to an Idle state.
OpenConfirm BGP When in the OpenConfirm, the BGP is waiting on a Keepalive or Notification message: (a) Once the peer receives a Keepalive message, it will move into the Established state. (b) If the local peer does not receive a Keepalive within the negotiated session hold timer, it will send a Notification message and transition back to Idle state. The same will occur if the local peer sends out a Notification message.
Established BGP Having received the Keepalive message, the BGP session is fully Established. The peers are now able to exchange Update, Notification and Keepalive messages

BGP Message Types

As shown above, there are a number of different messages sent between BGP peers to Establish a session and when even the peering has been established, messages are used to ensure that both peers have synchronized routing information. BGP can only process a message after the entire message has been received, the maximum message size is 4096 bytes with 19 bytes being the smallest message size, this would be just be a header with no data. Each message type uses a fixed header size of 19 bytes with BGP Keepalives not include any data after the header, so they will always use the minimum size.

Each Message would be include the following:

Message Type Description
Open Once TCP connection has been completed both peers will send out an Open Message. This message starts the peering session, it provides details about the remote peer, in addition to details about supported and optional options. These details are included: BGP version (normally version 4), AS number, Hold Time, Router ID, Par-Len and Optional Parameters
Update An Update Message sends a list of new, withdrawn or types of routes from the remote peer. Depending on the routing policy of remote peer these may or not be entered into the Routing Table. These details are included: Unfeasible Route Length, Withdrawn Routes, Path Attribute Length and Path Attribute and Network Length Reachability Information (NLRI)
Keepalive It is important to always remember that Keepalive Messages are not used to ensure the TCP connection between peers is kept. They are used to ensure that BGP Hold Timers do not expire keeping alive the route exchange.
Notification Notification Message is used to inform a peer that there is an error with the BGP session. There are 6 Error code numbers: Message Header Error, Open Message Error, Update Message Error, Hold Timer Expired, Finite State Machine Error and Cease. In addition to 17 Sub Error codes (6 Open Message Errors and 11 Update Message Errors). These can found in RFC4271
Refresh Normally BGP can not readvertise routes that have already been acknowledged by a peer, if the BGP peer has been configured to soft clear of BGP sessions then peers will be able to exchange Refresh Messages. Some vendors you have to explicitly configure this, in Cisco you need to configure soft-reconfiguration whereas with Juniper it is set by default within JunOS.

Notes

Par-Len If this set, it informs the peer that optional parameters should be expected.


Optional Parameters This is where negotiable parameters are indicated, these would be authentication and capability extension such as Multiprotocol Extensions and route refresh


Unfeasible Route Length If this is set, it will tell the peer, the length of withdraw routes


Withdrawn Routes Lists IP prefixes that have been removed as they are no longer deemed as reachable


Path Attribute Length This indicates the total length of Path Attributes field. Its value allows the length of the Network Layer Reachability field to be determined. A value of 0 determines that neither Path Attribute and NLRI is present in the update.


Path Attribute The following properties for a route is included: Origin, AS Path, Next-Hop, Multi-Exit Discriminator (MED) and Local Preference


Network Length Reachability Information (NLRI) Lists IP prefixes that will be advertised as reachable via the AS


BGP Attributes

Unlike other Routing Protocols, BGP primary function is to find the best path to a destination and not the shortest path. BGP uses a number of attributes to calculate the best path for any given destination prefix. These attributes can be broken down into 4 types:

Attribute Types Information
Well known Mandatory These attributes must be known and understood by all BGP speakers. Additionally must exist within the BGP update messages.Attributes classed as Well Known Attributes: Origin, AS Path, Next-Hop
Well known Optional These attributes must be known and understood by all BGP speakers. However they don’t have to exist within a BGP update message.Attributes classed as Well Known Optional Attributes: Local Preference and Atomic Aggregator
Optional Transitive Attributes don’t need to be understood by a BGP speaker however the set flag(s) will need to be passed onto other neighbours. Attributes classed as Optional Transitive: Aggregator, Community and Extended Community
Optional Non-Transitive These attributes don’t need to be understood by a BGP speaker and the set flag(s) will not be passed onto other neighbours. Multi Exit Discriminator, Originator ID, Cluster List, Multiprotocol Reachable NLRI and Multiprotocol Unreachable NLRI

A BGP Update message could include some, if not all, of the following attributes:

Message Information
Origin (Attribute Code 1) The Origin Attribute confirms the source of the route aka where the route was learnt from. The Origin of a route can either be: I: Internal (0) The Route is learnt from IGP, E: External (1) The Route is learnt from EGP, ?: Incomplete (2) The Route is learnt by something that isn’t by Internal or External methods. The rule used for Origin is that: Internal is better than External which is better than Incomplete
AS Path (Attribute Code 2) AS Path is a list of AS numbers that are between the source AS router to the our own AS. The AS Path is primary usages are to prevent Routing Loops, assist in the Path Selection and Policy Based Routing (PBR). BGP router will drop any routes received where it can see its own AS number within the AS Path this is how Routing Loops are prevented. The path enables the router to make policy decisions based on the presence of certain AS’s within the path. Additionally routes with a shorter AS Path are preferred over routes with longer AS Path
Next-Hop (Attribute Code 3) This Attribute contains the IP address of the BGP peer that advertises the route. The Next-Hop is used for reachability and reliable of for the BGP session. For eBGP it is usually the peering address associated with the physical link with another AS. iBGP works differently as you can have situations where due to rules with iBGP the next-hop address isn’t reachable due to learning the route from another iBGP peer, in this situation the Next-Hop can be changed by policy.
Multi Exit Discriminator (Attribute Code 4) Multi Exit Discriminator (MED) is used when there are more than one route to the same upstream AS. The route with the lowest MED value is always preferred by default.
Local Preference (Attribute Code 5) Local Preference is an important attribute as it is the first attribute evaluated in the Path Selection Process. Local Preference is used for Infra-AS traffic communications for BGP session. As the name, suggests is only used to influence traffic within an AS. Oddly BGP prefers routes with the Highest Local Preference.
Atomic Aggregator (Attribute Code 6) Atomic Aggregator attribute is a notification that tells other BGP speakers within the AS-Path that some information has been lost and/or changed due to route aggregation. This may affect the best path selection because a less specific route was selected over more specific route.
Aggregator (Attribute Code 7) Aggregator attribute is set when an advertised route has been aggregated. This attribute contains the AS number and Router-ID of the Router that has performed the aggregation
Communities (Attribute Code 8) Community attribute is tag that is use to modify, filter and/or influence a common group of IP Prefix(es) to act in a user defined way. Communities uses 4-octets of space to represent its value. Communities are used in conjunction with PBR. A community is 32-bit value, that is common defined as AS/IP-address:User-defined ie 100:1 or 192.168.100.1:1. 100 would be the AS or 192.168.100.1 being the device loopback address with 1 being a value significant within AS100.
Originator ID (Attribute Code 9) Originator attribute is a loop prevention mechanism used within iBGP network using a Route Reflector. The Route Reflector attaches if own Router-ID to routes, so if it receives a route with its own Router-ID it will ignore the route.
Cluster List (Attribute Code 10) Cluster List similar to the Originator ID attribute is a loop prevention mechanism however if an iBGP network is used clustered set of Route Reflectors then routes have the Route Reflectors Cluster ID attached to the advertised routes.
Multi-Protocol Reachable NLRI (Attribute Code 14) Multi-Protocol Reachable NLRI has two main functions as defined in RFC 4760: (a) Negotiates what non IPv4 unicast families will be announced between two BGP peers. (b) Defines the Network Layer Address of the router that should be the next-hop of the destination families. Ie if you have advertised l2vpn bgp family the next-hop for this bgp family will be defined within this attribute. When this attribute is used in a BGP Update message, the Origin and AS Path attributes have to be included. Local Preference attribute is additionally added to Update messages for iBGP peering sessions.
Multi-Protocol Unreachable NLRI (Attribute Code 15) Multi-Protocol Unreachable NLRI attribute is used to withdraw any BGP families that are no longer being advertised between BGP peers.
Extended Communities (Attribute Code 16) Extended Communities are the same as Community attribute however it has 8 octets of space to represent the community compared to 4 octets with normal communities. This allows 64-bit value, it can be represented as Type:Global-Administrator:Local-Administrator. It is important to note that you have set amount of bits you can use. You will have 16 bits for the Type, 16 bits, for the Global-Administrator (commonly the ASN/IP address) and 32 bits, for the Local-Administrator (commonly user defined).

BGP Path Selection

When a destination prefix reached by multiple routes via BGP by default only one path will be advertised into the Routing Table. With this in mind BGP has used its Route Selection Algorithm to determine what path will be installed into the Routing Table. The algorithm uses the following steps:

  • Prefer the highest Local Preference Value

  • Checks what path has shortest AS Path

  • The Route with the Lowest Origin Value

  • If the route has a Lower MED

  • If the Prefix is learnt via eBGP is preferred over being learnt via iBGP

  • The path with the better exit out of the local AS. This means that the underlying IGP metric cost is taken into consideration, the path with the lowest IGP is preferred

  • The eBGP route that has the longest uptime or prefer the routes from the peer with lowest Router ID

  • Prefer routes with the shortest Cluster List Length. This is when you use a Route Reflector within your iBGP peering session

  • Prefer routes from a peer with the lowest IP Address

Some vendors have their own vendor specific additions to the path selection algorithm. Cisco use Weight before checking Local Preference and Juniper verify that the Next-Hop is reachable before checking Local Preference. With JunOS, if the Next-Hop isn’t verified then the route is set as a Hidden route and will need investigating.

Share on LinkedIn
Share on Reddit