Juniper Networks Demonstrates Congestion Avoidance with Paragon Automation
Manual and semi-manual tasks take out a lot of time and bandwidth from a network engineer’s workday.
In this Tech Field Day Showcase, presenter Julian Lucek, discusses Juniper Networks’ Paragon Automation platform. The platform provides closed-loop automation leveraging AI, making sure that services are delivered right and on-time.
You’ll learn
The different software products included in the suite
How the pieces interact with each other synchronously within an API driven framework
Who is this for?
Host
Guest speakers
Transcript
0:09 hi everyone I'm Julian luchak I'm a
0:12 distinguished SC at Juniper Networks and
0:15 I'll talk in more detail about the
0:17 components of the Paragon automation
0:19 Suite so here are the different
0:22 components of the Paragon automation
0:24 Suite so looking from right to left
0:26 across this diagram first of all we have
0:29 Paragon Pathfinder which will play a
0:31 starring role in the demos that we're
0:34 going to show today and Paragon
0:36 Pathfinder has the ability to create
0:38 traffic engineered LSPs those can be SRT
0:41 or rsvpte LSPs and it can modify the
0:46 paths of those LSPs during their
0:48 lifetime according to observations made
0:51 from the live Network
0:54 and then we have Paragon insights that's
0:56 our health monitoring system it's
0:59 capable of identifying faults in network
1:01 elements and taking
1:03 um corresponding actions Paragon active
1:06 Assurance is the solution for active
1:10 probing across the network it can send
1:13 probes between pops or from Pops to
1:18 customer sites or from pop to cloud in
1:21 order to ascertain the performance
1:23 between different end points and that
1:26 can be in the form of parameters such as
1:30 delay delay variation and packet loss
1:34 ratios and then finally we have Paragon
1:36 planner which is an offline and planning
1:39 tool and that's capable of taking
1:42 snapshots of the live network from
1:45 Paragon Pathfinder and then with that
1:49 snapshots you can have a network model
1:52 on which you can do exhaustive earlier
1:54 simulations or capacity planning all of
1:57 these components are Cloud native
1:59 kubernetes based you can deploy them on
2:03 Prem you could deploy them in the public
2:05 cloud and also we recently announced
2:08 that we're going to have a SAS based
2:09 solution as well
2:12 what I'd like to do next is to show how
2:15 the different components can interact
2:17 with each other so at the bottom of this
2:19 slide we have the network itself and
2:22 Pathfinder can see the topology of the
2:26 network through the bgpls protocol so
2:28 that allows it to see the layouts of the
2:31 links and nodes and attributes of links
2:33 such as bandwidth srlgs and different
2:36 types of metric also Pathfinder can
2:40 create traffic engineered LSPs via the
2:43 pset protocol and modify them according
2:46 to observed conditions or user input now
2:50 Paragon insights is receiving streaming
2:52 Telemetry from the network in order that
2:54 it can ascertain the health of network
2:57 elements and for automatic remediation
3:02 it can send requests to Paragon
3:04 Pathfinder to create a maintenance on a
3:08 faulty Network element
3:10 Pathfinder can expose meshes of traffic
3:14 engineered nsps and some you can choose
3:16 to map a VPN to a particular flavor of
3:18 LSPs so for example if you have a VPN
3:21 that needs low latency service you can
3:23 map it onto the minimum latency mesh of
3:26 LSP Paragon active Assurance is sending
3:30 probes through the network in order to
3:32 ascertain the performance between
3:34 different endpoints as we heard and if
3:38 the performance
3:39 levels are violated then active
3:42 Assurance can send an alert to Paragon
3:44 insights so that it can take actions
3:46 accordingly
3:48 finally Paragon Pathfinder can trick can
3:52 create
3:53 snapshots of the live Network and those
3:57 can be passed on to
3:59 um Paragon planner and so that you can
4:01 perform capacity planning and simulation
4:04 in a network model that has been derived
4:08 from the live Network
4:11 for the te LSP creation mechanism
4:16 you've got psep down there is there
4:19 still a requirement for the netconf
4:22 configuration pieces for pushing LSPs at
4:27 one point when I looked at the previous
4:29 version of this for Juniper equipment it
4:33 didn't actually provision via psap it
4:35 still used netconf to do that
4:38 it's always
4:40 um it's always supported um both
4:42 actually so from the outset both have
4:43 been
4:44 um supported so netconf can be useful if
4:47 you've got Legacy devices that don't
4:49 support
4:50 um pseps so it's another it's an
4:52 alternative method that can be used but
4:54 in the main um people tend to use
4:57 um psep you know I like that a lot that
5:00 it supported both because in the
5:02 transition mode when we were deploying
5:04 this in a in a network being able to use
5:07 the net conf to really add psep as an
5:11 alternative control on a pre-configured
5:13 running LSP
5:16 um was a great option to transitioning
5:19 the the network over to a complete path
5:22 control on a on an existing uh large
5:27 Network because we all know there's no
5:29 such thing as a green fielded service
5:30 providers
5:32 yeah that's very true yes certainly the
5:34 other thing you can do is um if you have
5:36 pre-existing
5:38 um LSPs that have been created on the um
5:42 Ingress routers you know via CLI
5:45 um config
5:46 um you can actually delegate those to
5:49 um Pathfinder via psip and so
5:52 um you know if you turn on psep then
5:54 there's an extra line of config on an
5:56 LSP to delegate it and what delegation
5:59 means is that pset message gets sent by
6:02 the Ingress router to the controller
6:05 Pathfinder you know saying that
6:08 um this LSP has been
6:11 um delegated and from that point on
6:13 um Pathfinder can alter the path of the
6:16 LSP as needed in it during its lifetime
6:19 so that's an alternative
6:21 method of having psep you know running
6:25 with pre-existing nsps we found that was
6:28 a great feature to do that because you
6:31 when we first this up you have a you
6:34 know you have thousands of of
6:36 uh of services and LSPs out there and
6:40 just to be able to add that one
6:42 delegation line to existing
6:44 configuration with no service
6:46 Interruption whatsoever and then all of
6:48 a sudden you have the you know the
6:50 control the central control and the
6:52 rerouting possible
6:54 oh another thing is actually you can do
6:56 the delegation from the Pathfinder side
6:59 so if you wish you can have
7:01 um Pathfinder you know at that extra
7:03 line of config you know for you in order
7:05 to trigger that delegation yeah that's
7:08 exactly what we did we just had the push
7:11 the line push the line
7:13 the the talk of the LSPs there is is
7:16 there was a whole lot of uh word soup
7:19 that or acronym soup that happened there
7:23 um and for anyone that isn't totally
7:25 familiar with LSP provisioning in NSR
7:30 networks the the terminology and the
7:33 mechanism that we just described is
7:35 essentially
7:36 PCC initiated pce controlled right
7:40 and I think it really shouldn't be
7:43 understated like Steve said that the
7:45 fact you can if you have an existing
7:47 mechanism for provisioning LSPs command
7:51 line or whatever it is right you already
7:53 have an existing system you can still
7:56 use that while you Transit transition
7:58 over to using a
8:02 um you know segment routing controller
8:04 pce
8:05 and then just slowly transition each of
8:08 the LSPs over it's not a boil the ocean
8:11 uh type scenario for a large Network
8:14 that has you know maybe hundreds or
8:16 thousands of LSPs already that they
8:19 don't need to do a
8:20 you know a Flag Day essentially you can
8:23 just delegate those existing LSPs and
8:25 continue to use your old mechanism until
8:27 you move over to the new one absolutely
8:30 yes I mean one way of doing that is you
8:33 could do it
8:34 um Ingress routes to buy Ingress routes
8:36 you could say on day one
8:38 um the LSPs of which this router is the
8:40 Ingress those are the ones I'm going to
8:42 delegate now and then subsequently you
8:45 know another Edge routes which is the
8:48 Ingress of some LSPs you know those can
8:50 be delegated and so on so that's one
8:52 method of doing it in a stage by stage
8:55 way
8:57 yeah that's that shouldn't be that
8:59 shouldn't be discounted that's a that's
9:01 a pretty
9:02 powerful feature set yeah well and you
9:06 even have the flexibility of using
9:08 Paragon as the
9:11 controller without actually provisioning
9:13 anything from it you know you can just
9:16 use your existing provisioning system in
9:19 in all its uh glory and well-knownness
9:23 uh to continue to push things out there
9:26 and and still have the LSPs controlled
9:29 by Paragon even though it created none
9:31 of them
9:32 yeah that's that's pretty much the
9:35 migration strategy that I've seen
9:37 every time
9:39 so next we're going to see a demo of
9:41 automated congestion avoidance so we're
9:44 looking at the same network as before
9:46 but this time I'm now showing the
9:48 percentage utilization on each link this
9:52 information is derived from the
9:54 streaming Telemetry that the routers are
9:56 sending to the Paragon automation system
10:00 and you'll note that the links have got
10:03 varying amounts of traffic on them and
10:06 in particular this one which I'd like to
10:08 highlight the link between Amsterdam and
10:10 Hamburg in the direction from Amsterdam
10:12 to Hamburg has
10:14 um about 76 percent
10:17 um traffic loading at um the moment so
10:21 far I haven't turned the automated
10:22 congestion avoidance on because what I
10:25 want to do is to turn it on in a few
10:28 minutes so that we can see the
10:29 difference before and after let's have a
10:32 look at the paths of some LSPs that are
10:35 passing through that busy link so the
10:38 LSP from Amsterdam to Prague is passing
10:41 through that link as you can see
10:43 as is the one from Amsterdam to Berlin
10:47 and so now I'm going to do is to
10:49 actually turn on the congestion
10:51 avoidance so I'm going to go into one of
10:54 the menus
11:00 and so now I'm going into a settings
11:01 menu normally when we'd have this turned
11:04 on permanently but we want to see the
11:07 difference before and after so I'm going
11:08 to set a threshold here and
11:13 we are going to then submit that
11:17 and so this is a threshold
11:19 um above which we wish
11:22 um the Pathfinder to move um some of the
11:25 LSPs in order to bring the link below
11:27 the and threshold again we here have
11:31 applied it on a global basis but you can
11:33 also apply it in a more granular basis
11:35 with a different threshold on each link
11:37 if you wish so while we're waiting for
11:40 that to kick in we'll see with the aid
11:43 of some slides how the system
11:45 um actually deals with the congestion
11:48 so here's a diagram I'm explaining how
11:52 the scheme works and of course if a link
11:56 gets congested to the extent that it's
11:58 actually dropping packets then
12:01 um clearly the customer's applications
12:03 are
12:04 um going to suffer so that's one of the
12:06 motivations for having this congestion
12:09 avoidance the other motivation is and
12:11 from the capex point of view if traffic
12:14 is spread efficiently around the network
12:16 in order to use the available nodes and
12:19 links then that delays somewhat the
12:23 point in time at which you need to
12:24 upgrade some of the links in the network
12:28 in the face of increasing traffic over
12:31 the course of the weeks and months
12:33 so let's see how it works and so
12:37 um Paragon
12:39 Pathfinder is receiving streaming
12:42 Telemetry relating to traffic and that
12:46 Telemetry is of um two different types
12:50 first of all the routers are reporting
12:52 how much traffic is traveling on each
12:55 physical link
12:58 and then the Ingress routers of traffic
13:00 engineered LSPs are reporting how much
13:04 traffic is entering each of the traffic
13:06 engineered LSPs of which it's the
13:09 Ingress because of course traffic can
13:11 only enter a traffic engineered LSP at
13:13 the Ingress router and those traffic
13:17 engineered LSPs could be RSVP LSPs or
13:20 they could be srte nsps and so in this
13:24 example Network R1 is reporting how much
13:27 traffic is entering the purple LSP and
13:30 the blue LSP and
13:32 um R4 is reporting how much traffic is
13:35 entering the green LSP
13:39 now let's suppose that um the link
13:42 between R2 and R3 has reached the
13:44 congestion threshold that we have um set
13:48 um Pathfinder can see that through the
13:50 streaming telemetry
13:52 also it knows which LSPs are passing
13:54 through that link that's reached the
13:57 congestion thresholds that we set and
14:01 furthermore it knows through the
14:03 streaming Telemetry how much traffic is
14:04 traveling on each of those nsps and so
14:07 Pathfinder has all of the information it
14:10 needs in order to work out which LSPs to
14:13 move away in order to ease the um
14:16 congestion
14:18 and of course in so doing it needs to
14:20 make sure not to cause congestion
14:22 elsewhere so it takes that into account
14:24 when making that um determination and
14:27 then having worked out which LSP to move
14:31 in this example it decides to move the
14:33 blue LSP it sends a pset message to R1
14:37 because r1's Ingress router of that blue
14:40 LSP and that pset message contains the
14:43 new path of the LSP and so in this
14:45 example it's R1 R5 R6 R7 because of
14:50 course it's Pathfinder that's
14:52 determining the new path of the LSP
14:54 because it's best placed to know what
14:57 path to move it onto because it can see
14:59 the traffic levels around the network
15:02 and so then R1 responds by moving the
15:06 LSP
15:06 accordingly and so you can see that
15:09 without any human intervention the
15:11 system succeeded in avoiding excessive
15:14 congestion occurring on the link
15:18 so let's now go back to our demo setup
15:22 and we will look at the network
15:26 topology again
15:29 and so now you can see a change that
15:31 link that we looked at before that had
15:33 quite a lot of traffic has now gone down
15:35 to about 43
15:37 um traffic loading and we can look at
15:40 the paths of some of the LSPs that we
15:42 looked at
15:43 um before you can see that this one
15:45 which previously had been using that
15:47 quite loaded link it's now moved on to a
15:51 different path now it follows the path
15:52 amp stem Frankfurt Prague and so what
15:55 happened is that the controller
15:58 Pathfinder moved that LSP in order to
16:01 bring that link below the congestion
16:04 threshold completely without any human
16:07 intervention
16:09 when this system is is kicking in is it
16:12 able to take in account
16:14 um whether or not there's a desire to
16:16 keep things symmetrical uh with LSP
16:19 pairs
16:21 um yes it does take that into account so
16:23 um the symmetric LSPs would
16:26 um stay
16:27 um intact that's um true yes that's
16:30 right
16:31 okay so it does the evaluation in both
16:33 directions to make sure that the the
16:35 move is not going to cause a problem
16:40 uh and the second question is is this
16:42 system also available for trending data
16:46 so that you can see when you need to do
16:48 upgrades for or you know bandwidth
16:51 upgrades for particular links and you
16:54 know when you would anticipate uh
16:56 Crossing thresholds
16:58 um yes you can look at um traffic as a
17:01 function of time on a given linking
17:04 facts we could look at that
17:07 um now or indeed on a given LSP so for
17:10 this LSP we can look at how much traffic
17:12 has been traveling as a function of time
17:14 along that LSP I mean this one is fairly
17:17 I'm flat and as you can see as I hover
17:19 this person you can see what the traffic
17:22 was at that point in time but then you
17:24 can do similar things for actual um
17:28 physical links as well within the
17:31 um networks that's something else that
17:34 um you can do
17:37 um in fact so we could look at a link
17:39 here by way of example
17:42 so here you can see traffic on this link
17:44 as a function of time in each direction
17:47 as well you can see that it dropped um
17:50 in the last few minutes after the
17:52 congestion avoidance um kicked in
17:54 um in one of the directions that's quite
17:58 um handy now when it comes to
18:00 um capacity planning as I mentioned one
18:03 can take snapshots of the network and
18:05 import them into Paragon planner which
18:08 is the planning tool and that snapshot
18:10 includes um traffic levels and so um
18:14 then in planner if you want to
18:17 um look at anticipated traffic over the
18:20 next few months you could apply a
18:22 multiplier for example you could
18:23 multiply all of the traffic
18:25 um by a multiplication factor of say 1.5
18:28 to see you know what effect that has and
18:31 with that increased traffic one can
18:33 perform exhaustive failure simulation to
18:35 see that even in the face of the
18:37 increased um traffic
18:40 um Can the network for example survive
18:42 single link failure or double link
18:45 failures and node failures srog failures
18:49 and so on so the two work quite well
18:52 hand in hand when it comes to
18:55 um you know on the one hand live traffic
18:56 management within the live Network and
18:58 also and capacity planning looking into
19:00 the future
19:02 if I have SLA traffic that I'm carrying
19:04 on those LSPs for you know various mpls
19:06 VPN services for Enterprise customers or
19:08 other service providers how do you
19:11 orchestrate that how do you kind of plan
19:12 for that and how do you
19:14 um more specifically figure out when
19:16 you've when you've over committed your
19:18 your slas and say hey I have more
19:20 traffic that I can move I need to meet
19:22 this latency Target I need to meet this
19:24 whatever this target is
19:26 um and I can't move these because they
19:28 I'll violate SLA how does that rise yes
19:31 well when it comes to latency that yeah
19:33 it's a very good question actually
19:34 because
19:35 um if
19:37 um you know before the congestion
19:38 occurred
19:39 um you know presumably the low latency
19:42 traffic is following the lowest latency
19:43 path because as you saw in the previous
19:46 um demos
19:47 um you know Pathfinder can keep
19:50 um you know low latency LSPs on the ls
19:53 on the path that is currently the lowest
19:55 latency path and so you don't want the
19:57 congestion avoidance to divert such low
20:00 LSPs sorry such low latency LSPs away
20:04 and because presumably the New Path will
20:06 be somewhat longer and
20:08 um therefore have higher latency and we
20:11 can ensure that um in fact the way the
20:13 congestion avoidance works is that
20:16 um first of all it considers as
20:18 candidates for moving the LSPs that have
20:21 the worst um priority level because an
20:24 LSP can have one of eight priority
20:26 levels and um if there's no suitable
20:29 candidate within that worst priority
20:31 level moves to the next one and so on
20:33 and so if you wish low latency LSPs not
20:36 to be moved by the congestion avoidance
20:38 algorithm then you can give them the
20:41 best and priority level so they'll be
20:43 the last to be considered
20:45 um as candidates for moving in so they
20:47 tend to stay on to the you know current
20:50 presumably lowest latency path so the
20:54 priority level is that is that an
20:55 arbitrary value that is assigned by the
20:57 controller is there a you know mapping
20:58 to experimental bits with that how does
21:00 that how does that work exactly
21:01 um that's the value that you can set
21:03 um on the NSP
21:05 um either at creation time or
21:07 um subsequently so it's
21:09 um literally a sort of value ranging
21:11 from zero and to seven that expresses
21:13 the priority
21:16 um level
21:17 um are those levels mapped directly into
21:20 queues
21:23 not um necessarily
21:26 um you know you don't have to actually
21:29 um have a you know mapping between
21:31 priority levels and cues it's more
21:34 um you know related to
21:37 um you know preemption and hold
21:39 um priorities
21:41 um you know if at the end of the day
21:43 there's insufficient
21:45 um bandwidth to carry LSPs across the
21:48 network than the in priority level
21:50 determines which ones you know get
21:53 um access to the network and capacity
21:56 and also you know has a bearing on the
21:59 congestion avoidance algorithm but um
22:01 it's not necessary to have a mapping
22:04 from that into I'm cues necessarily
22:07 okay yeah I'd like to think of that more
22:09 as a controller level priority rather
22:12 than a class of service type of thing
22:13 exactly it's more to do with the
22:16 behavior of the LSP as a whole that's
22:18 right okay so that that is specific to
22:20 the controller and not necessarily
22:23 there's no prerequisite to have
22:26 um you know specific class service cues
22:29 that map directly into those
22:31 really correct distribution of what
22:33 you're saying correct thank you