Juniper Networks Demonstrates Path Diversity and Low Latency Routing with Paragon Automation
Catch a deep-dive session on the Network Optimization piece of the Paragon Automation suite. This includes is a live demonstration.
You’ll learn
Path diversity
Low latency routing
Who is this for?
Host
Guest speakers
Transcript
0:09 I'm antonelita a technical solution
0:13 consultant at Junior par networks we
0:15 have been talking about Paragon
0:18 automation suite and its applications in
0:21 the network
0:22 from planning orchestration Assurance up
0:26 to optimization
0:27 and today we'd like to go to a deeper
0:31 dive into the last part which is
0:33 optimization in live Network
0:36 right with the first first case which is
0:41 path diversity some customers have
0:45 business requirement
0:47 to um
0:49 to provide a really diverse label switch
0:52 tasks to avoid a single point of failure
0:55 in a network
0:57 um
0:58 not relying on faster route techniques
1:01 and those customers who typically as
1:03 well request a bidirectional corouted
1:05 LSP
1:07 so that the forward and reverse paths
1:10 are sticking to the same set of links
1:12 and nodes
1:14 in such circumstances it is basically
1:17 required to have a central controller
1:19 with a Global Network View
1:20 just because if you look at this diagram
1:24 an English PE like pe1 has no idea of
1:29 LSPs which are instantiated from another
1:32 English B like pe2
1:34 only a controller with global view would
1:38 be able
1:39 to provide true diverse LSPs which have
1:43 been started from different Ingress PES
1:48 I will switch to the network view now
1:52 so um this is our base example Network
1:57 with a few nodes and I will show now the
2:01 link labels
2:03 according to the Isis metrics
2:06 we have recreated two
2:10 different tunnels one is going from
2:13 Amsterdam to Berlin
2:16 and another one is going from Brussels
2:18 to Prague
2:20 so they start and end on different nodes
2:23 in the network and due to the metrics
2:26 that we have in this example Network
2:28 they are crossing the same middle point
2:32 in Hamburg
2:33 so this is a single point of failure in
2:36 case something happens there both LSPs
2:40 will need to be rerouted or will go down
2:43 for a certain period of time
2:46 so how to avoid this happening
2:50 um we could of course do the
2:52 provisioning from uh from the controller
2:54 so we have here specific tabs here to
2:58 provision diverse tunnels
3:00 but I would like as well to um touch
3:02 upon
3:03 um
3:04 automatic uh or apis so
3:09 we have in Pathfinder Northbound
3:11 interface using rest
3:14 and if we were to program those LSPs
3:19 automatically we would use this
3:21 interface to push the request to
3:24 Pathfinder and I will now show
3:27 um such a rest client
3:32 um I will first need to authenticate
3:34 myself with Pathfinder
3:36 um I will get a token as a reply from
3:39 Pathfinder and I could use this token in
3:42 my programmatic apis and Pathfinder
3:49 a rest call
3:52 and this uh
3:54 content is formatted in Json
3:57 and the contents says which LSP is to
4:00 create their name configuration method
4:03 is psap
4:05 and we have endnotes
4:08 and we have properties for each LSP
4:10 important are of course
4:13 diversity level uh diversity group and
4:16 that we want to create a core routed
4:18 pair ID
4:20 and the same goes for the other pair of
4:23 LSPs that we are going to signal
4:26 so now I'm really triggering
4:28 the
4:31 Ascent button
4:32 and I already received the reply from
4:34 Pathfinder in the bottom part of the
4:36 screen
4:38 mirroring basically my request and with
4:41 some few attributes added like admin
4:43 status and some others
4:46 so now it is safe to switch back to the
4:49 network View
4:51 which suggests that we have new network
4:55 events
4:56 I will refresh the LSP table to show the
5:01 Recently Added LSPs
5:04 and these are selected now so Brussel to
5:07 Prague and it's a reverse direction from
5:11 Prague to Brussels are taking exactly
5:14 same
5:16 links and nodes throughout the network
5:19 and same happens to the second pair of
5:22 bidirectional connected LSPs between
5:24 Amsterdam and Berlin
5:26 but if I select all four of them
5:28 this will show me that they are indeed
5:32 truly diverse
5:33 to each other and not crossing any
5:36 single point of failure in this network
5:39 so with this we show that using a
5:43 controller with a Global Network view
5:45 allows establishing maintaining path
5:48 diversity even if you have a requirement
5:51 to have a bidirectional routed LSPs I
5:56 just want to confirm that this system is
6:00 also able to take into account things
6:03 like shared risk link groups and
6:06 um and coloring that can be done that
6:09 can label basically underlying physical
6:12 shared infrastructure as opposed to The
6:14 Logical one
6:17 yes really great comments so we we
6:20 indeed have a possibility to take into
6:23 consideration from the large site where
6:27 we have multiple nodes and all the way
6:30 down through a single link and a shared
6:33 with link groups including nodes and
6:37 links
6:38 let's say that
6:40 um your network is
6:42 um
6:43 significantly larger than this and you
6:47 need to have an explicit ero that
6:50 transits the entire network that exceeds
6:53 the maximum uh
6:55 label depth of the hardware
6:59 does this platform support things like a
7:03 binding Sid to create a
7:06 you know a longer ero than let's say
7:09 whatever 12 or whatever the maximum
7:12 segment depth or maximum label depth is
7:15 yes so so this is indeed um a question
7:19 um for uh many service providers where
7:21 the number of hops might exceed
7:23 um the hardware uh possibilities of
7:26 Ingress nodes
7:27 um for this we have foreseen a few
7:30 Solutions one of them would be um using
7:34 label compression so
7:36 um Pathfinder would is able to create
7:38 LSPs consisting of as much as a single
7:42 uh label if if you don't want to stick
7:44 with uh
7:45 um with specific nodes but with your
7:48 requirement if your requirement is to go
7:50 through a certain
7:53 um certain segments in the network then
7:55 indeed we can leverage binding seats
7:58 this is supported in Pathfinder to
8:00 create a smaller label Stacks
8:03 um so that the transit node is
8:05 uncompressing this binding seed and
8:08 sending it over the next list of
8:11 segments
8:12 all right so essentially you know
8:14 stitching two LSPs together but they
8:16 look like one
8:18 is really kind of the
8:20 high level explanation of what I've
8:22 asked for the other question I have is
8:25 for LSP failover
8:29 um do you have a support for running
8:33 spfd inside the LSPs and does this
8:37 support that signaling
8:40 oh yeah yes so uh we support a seamless
8:44 bft for every provisioned LSP
8:48 um and these support has been as well
8:51 proven at the latest entc meeting we had
8:54 in um in February this year with um
8:57 other vendors as well
8:59 and this is kind of relating around
9:01 recent events you know one of the things
9:02 in the last couple years as we saw the
9:04 the rise of bandwidth and the saturation
9:06 of bandwidth with the pandemic one of
9:08 the things I noticed since you have
9:09 Europe up here specifically is that the
9:13 the the background infrastructures of
9:14 Europe and the United States are very
9:15 different United States depends on a lot
9:17 of caching points that are very close to
9:19 the provider edges in Europe there's a
9:21 lot of pnis and it's you know largely
9:23 dependent more on bandwidth than pmi's
9:25 than caching when you get them to a more
9:27 complex topology and scenario how does
9:29 that how does that scale as far as being
9:30 able to Monitor and react I know that
9:32 you know in the beginning of the
9:33 pandemic we saw a lot of pnis saturated
9:36 as we were going across uh Eastern
9:38 Europe Central Europe into Western
9:39 Europe and a pack of loss and things
9:41 like that if I'm managing this if I'm a
9:43 you know tier one Transit provider you
9:45 know how would I use this to scale to be
9:47 able to manage reacting to those kinds
9:49 of challenges on a large scale so we
9:52 have automated congestion avoidance in
9:54 the network which will be presented uh
9:56 in a few minutes by my colleague Julian
9:59 okay so um I'm switching to the um next
10:04 um use case we are going to show today
10:07 and which is um low latency routing
10:11 here the business requirement is uh to
10:14 provide the lowest latency or maybe even
10:16 guaranteed lowest latency for critical
10:18 Services
10:19 um with um even including some service
10:22 level agreements
10:24 modern networks have a really a mixed
10:28 speed links
10:30 which are participating in such networks
10:33 variety probably from 10 to 400 gig off
10:37 with all possible speed variations
10:39 so multiple service providers would
10:42 um base their metrics not on the delay
10:45 but they would use for example bandwidth
10:49 as a reference for the metrics then it
10:52 is not optimal for this business this
10:55 business requirement because the higher
10:57 bandwidth path is not always the lowest
11:00 delay
11:01 so how to solve this premium requirement
11:06 without rebuilding the whole network
11:08 metric system
11:10 so we have here um a solution comprising
11:15 multiple components so first we need to
11:18 measure the latency on each Network
11:20 segment this is this is obvious
11:22 um and then we need to distribute this
11:24 information to the controller and let
11:27 the controller
11:28 um find the lowest delay path uh having
11:32 the sum of all of the delays on every
11:35 participating Network segment
11:38 but on top of this we want to make sure
11:40 that we understand that our customer is
11:45 IP using the network and how to measure
11:49 this user experience
11:51 um this is the big question and we have
11:53 an answer to this with uh
11:56 simulating customer traffic and over
11:59 service providers Network so that we
12:02 really see the experience that a normal
12:05 user would have
12:09 um I'm switching back to the network
12:11 View and I will change the link labels
12:15 to show the measure DeLay So we have
12:19 here um on multiple links Dynamic delay
12:23 measurement I like on the selected link
12:25 I'm seldom to Frankfurt some other links
12:28 might have static value for the delay
12:31 measurement
12:33 um I would like to um to focus today
12:38 um on this Crosslink just because we
12:40 have an impairment tool which would make
12:43 this latency look much worse but before
12:47 I impair it
12:49 um I would like to review a few tunnels
12:52 a few LSPs which are crossing this link
12:56 for example one of them
12:58 um starting with the name LL for low
13:00 latency I have selected it now and it
13:03 goes from Amsterdam to Brussel but it is
13:07 crossing a node in Frankfurt just
13:09 because the direct link I'm turning to
13:12 Brussels has a higher latency of 15
13:14 milliseconds compared to Total latency
13:18 of just a little Beyond three
13:21 milliseconds
13:22 going over Frankfurt
13:25 so um before I explain how this data
13:29 gets into a Pathfinder I will switch to
13:32 an impairment tool and we'll start an
13:37 impairment on this link
13:43 so
13:47 while the impairment tool is uh doing
13:49 its job
13:51 um Let Me Explain how we get this data
13:53 so first um the two adjacent nodes like
13:57 comes to them in Frankfurt
13:59 um they sent
14:00 um so-called two-way active measurement
14:02 protocol T1 flight probes across the
14:05 link to each other
14:07 to be able to very precisely measure the
14:11 latency on the link so it is measured in
14:13 microseconds
14:14 and then this information is uh being
14:18 propagated to the igp like Isis or ospf
14:22 and from there for each Network domain
14:26 we export this data along with other
14:29 traffic information
14:31 um
14:32 we export it to a central controller
14:35 like python Pathfinder and then we are
14:38 able to figure out what is the lowest
14:41 layaway from Pathfinder and all what
14:44 remains is to use a piece of protocol to
14:47 Signal an LSP or change a path of an
14:50 existing LSP
14:53 um if you looked um at the screen while
14:56 I was talking you probably already
14:57 noticed that this Crosslink already has
15:00 some average delay increased from um
15:03 from the value of sub 1 milliseconds to
15:06 um 35 uh or something
15:09 milliseconds in average for the um last
15:12 period of time of measurement
15:15 so um
15:17 how do we know that this increase in
15:20 measurement does
15:24 um just introduce any problem to our
15:27 customers for this we use Paragon active
15:30 Assurance to inject
15:33 uh synthetic probes which are mimicking
15:36 customer traffic
15:39 um for here here I have a set of
15:43 low latency probes which are using the
15:46 customer vpns all around the network
15:49 and you probably already see that um the
15:54 green bar which is showing the quality
15:56 of our service for the last 15 minutes
16:00 this is the selected interval has turned
16:03 from Green
16:05 um to First red and then to black color
16:08 let me explain what um these colors mean
16:12 so um this is a drill down view of the
16:16 same active probe we had previously a
16:21 value of delay which is uh according to
16:25 the SLA um with this customer and then
16:28 after introducing impairment the delay
16:31 jumped up to a 50 millisecond which is a
16:35 breaching
16:36 um the contract and then this value is
16:40 considered to be equal to an outage
16:43 because it is way higher than we
16:46 promised
16:48 but you probably already noticed that um
16:51 after some time the delay went uh down
16:54 back to a couple of milliseconds so let
16:58 us see what happened and why the delay
17:01 turned back to normal
17:06 so I'm switching back to our Network
17:08 View
17:10 so we already saw that Pathfinder has
17:12 received the updated delay information
17:15 from the network and it reflected it
17:18 even in the user interface and but what
17:22 happened in the background
17:24 uh Pathfinder has for this demo
17:27 aggressive LSP optimization timer
17:30 this timer reviews the delays in the
17:33 network and looks for a delay sensitive
17:37 LSPs and can automatically without human
17:40 intervention
17:42 reroute them to A New Path
17:45 and this is exactly what happened to our
17:47 example LSP that we saw a bit earlier
17:49 instead of going Amsterdam Frankfurt to
17:52 Brussels it now takes the direct path
17:54 from Amsterdam to Brussels just because
17:57 the latency on that link is 15
17:59 millisecond which is way lower compared
18:02 to the sum of latencies
18:04 on the path via Frankfurt and to be sure
18:08 that we are looking at the same LSB we
18:11 could check the events what happened to
18:14 this um LSP in the last period of time I
18:17 will select some value in the past so
18:20 this is exactly what we saw when we
18:22 started our demo and I can compare
18:25 visually compare
18:27 um the LSP path as of now so I have
18:30 selected the latest LSP update and we
18:33 clearly see that the change was exactly
18:37 as we noticed um earlier
18:41 so with this we have reviewed a very
18:44 demanded use case for a low delay
18:47 service placement
18:49 continuous measurement of customer
18:51 experience as well as automated LSP
18:55 optimization in a changing Network
18:57 environment and gives an operator very
19:01 powerful tool to provide best-in-class
19:03 service for their customers so how do
19:07 you
19:07 how do you account for uh failures in
19:09 the mpls data plane so you know
19:11 something you know it's really common on
19:13 you know any kind of equipment is you'll
19:14 have a you know you'll have a route that
19:16 gets pushed into
19:18 um the mpls forwarding plane forwarding
19:20 database you know and you've got an LSP
19:22 but the Asic and the table are out of
19:25 sync and you don't actually forward so
19:27 you're looking at this obviously you
19:29 have an LSP you think your LSP is good
19:31 you think that you're going to move
19:31 traffic to that LSP but it doesn't
19:34 actually work because of you know a bug
19:36 and the A6 out of sync with the
19:37 forwarding table how would you handle a
19:38 condition like that with the controller
19:40 is that something that is part of the
19:42 monitoring
19:43 yes yes so this is uh this is very uh
19:46 tough use case
19:48 um to to to to find the culprit for it
19:50 um and for this we could address
19:53 um with with two approaches or basically
19:55 a combination of um two things together
19:57 one would be um as shown uh previously
20:01 the active Assurance probes which uh in
20:03 timely manner can um see that the
20:06 traffic is being black holed and uh it
20:09 can trigger
20:10 um additional automated checks on
20:13 Paragon insights so Paragon insights uh
20:16 might start you know preparing for the
20:18 network operator which will troubleshoot
20:21 it uh afterwards a set of tests like for
20:25 example Trace routes
20:27 collecting interface information right
20:29 at the moment where it was triggered
20:31 this is what regards to active
20:34 monitoring but we have as well with
20:36 insights our passive monitoring which
20:39 should be run then continuously and
20:41 which would react on uh for example
20:44 increased counters of
20:47 traffic drops on the forwarding plane
20:51 just because the equipment usually is
20:53 able to you know to account for the
20:55 packets dropped for no reason like for
20:57 example no route to the destination and
21:00 then if we have this mismatching
21:03 programming of the data plane and in the
21:06 state of the routing tables and then we
21:09 would probably push some data
21:11 increasingly towards black hole
21:14 destination and we will see a high
21:16 increase of such counters in our
21:19 monitoring tools so there are really
21:21 many counters we could monitor with this
21:24 um and this is how we can tackle this
21:26 city iteration
21:28 if for any reason
21:31 then
21:32 there will be a change triggered by the
21:34 controller and given the telemetry
21:37 well insights and all the the mapping of
21:41 the information the controller is
21:43 realizes or not or notices that there is
21:46 a
21:47 detrimental
21:48 effect of that change is the controller
21:51 going to roll it back
21:54 do you need a user confirmation or
21:56 administrator confirmation for that for
21:58 instance
21:59 uh so so this is a truly configurable of
22:03 course so we we understand that uh
22:06 closed loop automation uh is is the way
22:09 uh to the Future uh but it we cannot uh
22:13 take it um right from day one and and
22:16 set to boil the ocean with um everything
22:19 um fully automated so the trust will be
22:21 gained uh you know uh step by step so
22:24 today most operators would probably
22:27 um trust a Pathfinder to do the
22:29 rerouting as shown in this demo uh for a
22:33 fully
22:33 um you know automated set of actions
22:36 like changing configurations uh maybe
22:38 rolling back and doing you know
22:40 artificial intelligence we need time to
22:43 um you know for to gain distrust uh but
22:47 we are on a good way here so for this we
22:49 have already some artificial
22:52 intelligence bits included in our
22:54 Paragon insights uh which would um help
22:57 us to to get there so we hope that uh
23:00 our showcase did provided some light on
23:02 what we can achieve today with a proven
23:04 Cloud native automation stack so if you
23:07 want to continue investigating uh the
23:10 those technology you have a two options
23:13 to suggestion one without have an array
23:16 that actually goes through the benefits
23:18 of implementing such technology and that
23:20 qualify those benefits or you can simply
23:23 ask for Pilots that's all that was shown
23:27 today is proven technology and that
23:30 technology that has been deployed by
23:32 serious fighter around the world so we
23:35 like as well to us thank our delegates
23:37 with all their most relevant questions
23:40 and we hope to hear from you and soon
23:44 thank you