FIB Compression - Optimizing your Routing Tables
Using FIB compression in routers to handle the internet tables size growth.
Since Junos 21.2, it's already enabled by default in some of our routers and more will use this feature in 2023. Learn how it works, the limitations, the platforms, and verify its efficiency in this video.
0:37 Key concepts (FIB, Current BGP table size and growth, ...)
2:04 FIB compression principles
4:58 Examples
6:38 Products supporting FIB compression
7:03 Support and limitations
7:40 Hardware implementation and processes involved
8:05 Best case scenario with a unique Next-Hop address for all public routes
8:57 Efficiency of the compression in live networks
You’ll learn
How Juniper is using FIB compression in some routers to handle the internet tables size growth
How FIB compression works
Who is this for?
Host
Transcript
0:20 what are the key concepts behind FIB compression technology, what kind of product can enable it,
0:26 what could be the limitations, how efficient it is in the best case for a public internet
0:31 table and finally we will verify the performance of the algorithm in real production networks.
Key concepts (FIB, Current BGP table size and growth, ...)
0:37 The concept of FIB or Forwarding Information Base should be very familiar to most of the
0:42 network engineers. In a nutshell, it's a table containing the routes (that means prefixes,
0:47 subnet mask and next-hop details) required by the Packet Forwarding Engine during the lookup phase.
0:53 PFE compares the destination address to the table entries and decides where the packet should be
0:58 sent to. In early 2023, if a router contains a full internet table and that the case for
1:04 multiple roles like peering, we have to install in Hardware more than 942,000 IPv4 and 173,000
1:12 IPv6 routes. It's a large quantity of prefixes. A rough approximation leads to the idea that an
1:19 IPv6 route occupies twice a space of an IPv4 one, simply because the vast majority of the
1:25 public table is made of /24s for IPv4 and /48s for IPv6. With this logic, "Internet" uses 1.3
1:34 million FIB entries and we are not even counting the more specific routes advertised via private
1:39 peering and the ones coming from IGP. Depending on the network, it could represent dozens of
1:45 thousands of extra entries. The Internet size growth doesn't seem to slow down and we will
1:50 need to create routers that will support this scale in 10 years or more. That's why it's not
1:55 unreasonable today to propose Hardware supporting a minimum of 4 million FIB entries. To handle this
2:01 large quantity of information, multiple approaches have been investigated. The FIB compression is one
FIB compression principles
2:07 of them and not necessarily orthogonal to the others. Depending on their organization, routes
2:12 can be aggregated and we can reduce the size they occupy in the ASIC memory, if they respect
2:17 certain conditions. It's interesting to note that aggregation is slightly different in the FIB than
2:23 from a BGP perspective. For example, in this "CIDR report aggregation summary" pages, the aggregated
2:29 routes are representing the blocks allocated to the different companies present on the Internet.
2:34 Basically, the routes coming from a specific Autonomous System. With FIB compression, we don't
2:39 really care about BGP information like AS-path. We will only consider: prefix, subnet mask and
2:45 next-hop address. By the way, I invite you to read this pretty interesting article published
2:49 by Sharada relative to the longest prefix matching in networking chipsets. It's really good, you
2:55 will find the link in the description. The key concepts behind FIB compression are very basic.
3:00 The first rule is the shadowing principle. If I receive a route with a "forwarding behavior"
3:05 similar to an existing superset, it doesn't bring any useful information. I don't program it. For
3:10 example, I received two routes 192.0.2.4/30 and 192.0.2.6/31, both pointing to NH1. I don't need
3:23 to program the /31. It will occupy space for nothing. The second principle is the compression
3:29 itself. If many routes can be aggregated into a superset just program this one and not the
3:35 contributing prefixes. Imagine we received 4x host routes represented here in light blue. So we have
3:42 four IPv4/32 all pointing to the same next-hop NH1. They have a similar forwarding behavior.
3:49 I represent them in binary: the first 30 bits of these four prefixes are identical and that's
3:55 my subnet mask. And the last two are covering all possible combinations. That gives us 192.0.2.12/30
4:03 pointing to NH1. We will only install this entry in the FIB and it will provide the exact
4:09 same service than having four host routes. And of course it will occupy just a quarter of the
4:14 memory space. In these diagrams, the blue color represents the routes received from the RIB,
4:20 could they be: direct, static or advertised by a dynamic protocol. And in green, we have the
4:26 computed aggregated routes. Dotted line means it's not installed and "normal" lines represent
4:32 what is programmed in Hardware. So you can see I aggregated and only programmed 192.0.2.12/30.
4:40 It's super basic I told you. Of course, these two principles are used at high scale
4:44 and you understand easily that depending on the organization, having a lot of prefixes pointing to
4:50 the same next-hop address could reduce the size of the table significantly. Even if they are actually
4:55 advertised by different Autonomous Systems. Let's take a slightly larger example: here, I
Examples
5:01 received 12 prefixes /32s and /31s all pointing to next-hop NH1. With this graphical representation,
5:09 it's an iterative process. At each level, the system creates Aggregates, when it's possible,
5:14 at higher level, they can be themselves associated with received prefixes or other Aggregates and on,
5:20 and on... Up to this /28 summary. In this other example, I do receive a bunch of /32s,
5:28 /31s and one /30. But with two different Next-Hop addresses. Quickly, we reach a point where routes
5:35 pointing to NH1 cannot be gathered with routes pointing to NH2. And that's how
5:40 far the compression can go. I've been asked this question several times: if certain entries are
5:45 missing to build an aggregate, can we create some negative routes to fill the gaps? The answer is:
5:51 no. If there is no contributor to create the aggregate, we don't invent a route pointing
5:56 to null0. The algorithm keeps it simple. In this example the 192.0.2.12/32 is not present
6:04 and therefore the structure will be compressed into four entries, and not just one. Another
6:09 interesting subtlety to understand: having a more specific route pointing to a different next-hop
6:14 doesn't prevent the algorithm from creating the aggregate. In this example, I have five routes:
6:19 192.0.2.5/32 points to NH2 while all other /31s point to NH1. We can still create the
6:29 aggregate to 192.0.2.0/29 and end up with these two prefixes installed in the hardware. The FIB
6:37 compression algorithm is implemented in multiple Juniper products. Since 21.2 R1, it's activated by
Products supporting FIB compression
6:44 default in PTX routers powered by Express 4 Packet Forwarding Engines. That's the PTX 10001-36MR
6:53 and the LC1201 and LC1202 in the PTX 10K chassis. Also, in 2023, it will be implemented in the
7:01 ACX7k family. It works for IPv4 and IPv6 unicast routes regardless of the advertising protocol.
Support and limitations
7:08 Even static or direct. And it works equally well in VRFs. It has no impact on uRPF (unicast reverse
7:16 path forwarding). We didn't implement compression for multicast routes, even if technically
7:21 speaking, it will be doable. The typical multicast table size doesn't really justify
7:26 the effort. Enabling some features could prevent the compression. That's the case for SCU/DCU
7:31 (source/destination class usage). Nothing to do in term of configuration, the routes involved are
7:36 simply flagged internally as not being part of the compression. In terms of hardware implementation,
Hardware implementation and processes involved
7:42 the compression is done inside the process just before pushing the routes into the PFE. In the PTX
7:48 case, it will be in evo-aftman-bt and in the ACX it will be in evo-pfemand. The compression is not
7:56 affecting the RIB (it's not even involving fibd), so that's why it has no impact on operations
8:01 between protocols like redistribution. Before checking how it works in production, there is
Best case scenario with a unique Next-Hop address for all public routes
8:06 one lab test that gonna give us an interesting information. You understand that compression
8:11 ratio will depend on the route organization, but also on the next-hops distribution. The best-case
8:16 scenario is when we inject routes from the unique next-hop address. This test will tell us how far
8:21 we can compress the public table. I'm advertising both v4 and v6 full table (that represents 930k v4
8:28 and 161k v6 entries) from a unique route injector. And we checked the compression ratio. That's 82
8:35 percent for IPv4: interesting, because we will see more and more IPv4 disaggregation in the future,
8:40 and therefore higher and higher compression opportunities. Also it's surprising to see
8:46 that the current IPv6 table, despite this huge addressable space, can still be compressed at 72
8:53 percent. But again, it represents an absolute best case scenario. Now let's take a look at the
Efficiency of the compression in live networks
8:58 performance in production environment. We asked a couple of our customers to send us the output
9:04 of specific CLI to identify: the number of routes, the compression ratio and the number of next-hop
9:09 addresses they see in their live routers. And the results are very good with a FIB space reduction
9:14 from 55 to 60 percent. So it's fair to say that FIB compression doubles the FIB space. For more
9:21 details, I invite you to read the TechPost article linked below: it contains much more
9:26 details and also some additional tests on route churn performed in our labs to verify it has
9:32 no impact on traffic. OK, let's wrap it up: FIB compression is enabled by default in the latest
9:38 PTX generation and will come soon on the ACX7k routers. It's a very efficient mechanism that
9:46 doubles the FIB space in complete transparency for the operator. See you in the next one :)
9:52 [Music]