Day 1: Managing your AI data center at scale with Juniper Networks

0:00 welcome everybody my name is Kyle

0:01 Beexter i'm going to be talking about

0:03 how in in deploying and managing your AI

0:08 data center how you can do that at scale

0:11 with

0:12 Abstra so the the challenges is there's

0:16 potentially lots of ports how do you

0:19 assign those virtual networks that you

0:21 need to be able to to run the train jobs

0:23 across all those ports you know you

0:25 don't want to do that manually one by

0:26 one by one by one because there could be

0:28 thousands or millions of different ports

0:30 when you're talking about some of these

0:31 larger um deployments um and then

0:34 there's ever evolving requirements um

0:37 there's there's new techniques that are

0:38 coming out for things like load

0:40 balancing um we saw in a in a previous

0:42 session about some of the new um RDMA

0:45 based load balancing that's coming out

0:47 um but what also about how do I

0:49 configure load balancing without having

0:51 to go read through hundreds and hundreds

0:53 of pages of articles to become an expert

0:55 and that's what we can do within Astra

0:58 is simplify that and give you just a few

1:01 click operations to to get those things

1:03 assigned and up and running so starting

1:07 with virtual network provisioning so how

1:09 do I assign virtual networks across my

1:13 entire network so the first thing that

1:16 that we talked about a little bit

1:17 earlier was you know how do we catch

1:19 things before we deploy so that is

1:21 exactly one thing that we can do here

1:23 that I'm going to show you um is the

1:26 first step is we've determined before

1:28 you've deployed this production in that

1:30 staging version of Abstra that there is

1:33 warnings that there's missing VLAN

1:35 assignments to Rails and so we've caught

1:38 that before you've actually deployed and

1:40 what we can then do is give you an easy

1:43 button to say go ahead and provision a a

1:47 VLAN across those rails and those ports

1:50 and then afterwards we can see yep I'm

1:54 all set we can see the the virtual

1:56 network the connectivity template all

1:58 assigned to it so what this does is it

2:02 simplifies that burden of managing and

2:05 gives you the ability with just a few

2:07 clicks to assign virtual networks make

2:09 sure your connectivity templates are all

2:12 connected to the ports and the rails

2:15 that you want you need

2:18 so let's take a quick look at this i'm

2:21 going to move to this demo real quick

2:24 and we'll see this in action it look

2:27 very similar to those screenshots but

2:29 we're going to go here to that

2:30 uncommitted tab which is showing what's

2:33 the difference so what we played around

2:35 with in staging between what's in

2:36 staging and our production and we can

2:38 see that the warning tab is highlighted

2:41 and so when we click on that warning tab

2:44 we'll immediately see that there's

2:45 interfaces associated with a rail that

2:47 are expected to have a VLAN for untagged

2:50 traffic and it's not there but we have

2:52 resolutions where it says "Hey do you

2:54 want to assign that VLAN?" And yes I do

2:58 i could do it one by one or what I'm

3:00 going to do is I'm going to look at all

3:01 of my rails and say "Let's bulk to it

3:04 let's I don't want to do it one by one

3:05 by one by one by one i can do it but

3:08 that's tedious why do I want to do

3:10 multiple things at once when I can do it

3:11 in one click and so I'll select all of

3:15 my rails and say let's provision VLANs

3:17 across all of them and we'll see that we

3:20 can review it we can look at it they

3:22 were all missing before but we'll add

3:24 them in there and we'll we'll see that

3:26 uh column go from missing to now they're

3:29 all assigned now the other thing Aster

3:32 does is it still catches that there's

3:35 still some red in there so we're going

3:36 to go over there and the one thing that

3:38 we didn't do is tell it what virtual

3:41 network what subnet to use so I'm going

3:43 to pick an IP pool i'm just going to

3:45 pick one of the default ones i can see

3:46 there and there how much of it is used

3:48 so I know there's some some space in

3:49 there and I'll pick that IP pool and so

3:53 then you can see the page all of a

3:54 sudden start everything turning green

3:56 showing me that it's validated this is

3:58 that continuous validation that we're

4:00 doing abstract that's continuously

4:01 looking to see did you do everything

4:02 that we that you need to do and so it

4:05 caught that you know we we didn't have

4:07 the the VLANs set and we didn't also

4:09 have a virtual network um set for a

4:12 subnet of IP addresses but once we fixed

4:14 all that look the warning tab now went

4:17 green and so there in just a few minutes

4:19 in a few clicks I was able to assign a

4:22 virtual network and an IP subnet for

4:24 that virtual network across all of my

4:27 ports and and rails so really cool to be

4:31 able to see how it catches that how it

4:32 does that continuous validation and

4:34 helps make that

4:37 smooth so I'm going to switch back over

4:40 to the

4:41 slides and we'll look at load balancing

4:44 so a question came up earlier about how

4:46 do I do load balancing um and in Abstra

4:50 we have the same kind of simple

4:52 intentbased driven models where we're

4:53 looking at what is the intended outcome

4:56 that you want what do you want it to do

4:58 well how do you want to work not let's

5:00 go switch by switch and configure DLB or

5:02 GLB it's do you want DLB or GLB do you

5:05 want flow lip per packet you know those

5:06 kinds of things and we'll see how that

5:08 how that works and we can go through

5:10 that so there's a simple walkthrough

5:13 configuration where you can go and you

5:14 can pick do I want DLB do I want it to

5:17 be flow or packet do I want to set some

5:19 of the activity in the in the intervals

5:22 but in there and I'll show in a quick

5:24 demo you can hover all those those

5:26 little question mark helps at the end

5:27 it's a little small but um you can hover

5:29 over all those and you'll be able to see

5:30 what all those parameters mean so you

5:33 don't have to go research and be like um

5:35 what is the inactivity interval i don't

5:37 really remember what it should be it'll

5:39 tell you right there when you hover over

5:40 it that hey this is what it is this is

5:42 the default value we set if you want to

5:44 change it you can if you want to keep

5:45 the default value great and we can also

5:48 do validation on there's specific things

5:50 that maybe only work on certain

5:52 hardwares so like GB only works on um

5:56 the 5240 um because of the specific

5:58 hardware it needs specific hardware we

6:00 can do that validation so you don't just

6:02 you know say yeah I want GLB and you

6:04 don't have hardware that can do it we

6:05 don't let you deploy it if it's not

6:07 going to work so we can do that

6:08 validation is RLB handled separately or

6:11 is that coming soon coming soon okay so

6:14 it's some of the the latest innovations

6:15 coming out um so we have DLB and GLB in

6:18 the product some of the great things

6:19 about Abstra is we have flexibility to

6:22 do things like that um separately from

6:24 our intentbased models we can do what we

6:26 call configlets so you can do custom

6:28 configurations when there is you know

6:30 like brand new innovations that come out

6:31 on a switch hardware um so if you want

6:33 to keep up with the latest and greatest

6:35 but we're actively working as we speak

6:37 to get things like the RDMA load

6:39 balancing um into the the configuration

6:41 that we're going to see right here got

6:42 it thank you mhm

6:46 so how this looks like is is a few steps

6:50 so starting with we assign a default um

6:54 policy that's that's DLB across because

6:56 that is that is on all the hardware so

6:58 we assign that default policy but if you

7:00 want to build your own it's only two

7:03 steps first you create a load balancing

7:05 policy and you go through the selections

7:07 on what options do you want again we

7:09 have the help text that tells you what

7:11 all they are what all the default values

7:13 and then you then assign it and you can

7:16 do it just like we saw before one by one

7:18 or you can do it in bulk so if you want

7:20 to do them all at once you can do that

7:22 if you want different ones maybe

7:23 different ones on the spine versus the

7:24 leaf you could have that option where

7:26 you can be able to do them across

7:28 different ones so let's see this in

7:32 action real quick um so actually go here

7:37 and then pull up this

7:39 one and we'll walk through it so it's in

7:43 our staged again we'll go to fabric

7:45 settings where we can look at the load

7:47 balancing and as as I mentioned there's

7:50 a default policy that comes already

7:52 assigned that's running DLB um but I

7:55 want to build my own so let's do it so

7:58 the first thing we go to is over here

8:00 the load balancing policies we will

8:02 create our own load balancing policy we

8:06 give it a name call it my policy or

8:08 whatever you want to call it and then we

8:09 can go through the options and here's

8:11 where I was talking about the help text

8:12 that you just hover over it and it tells

8:14 you exactly what those mean what are the

8:17 default values so you can see do I like

8:19 that default value do I want to change

8:20 it what do I want to do i can pick do I

8:23 want GB and helper load balancing you

8:25 know you can pick all those settings and

8:28 then you can simply go over to the

8:30 assignment and as I talked about you can

8:32 do it one by one so we could pick you

8:34 know just one by one here or if I wanted

8:37 to and this is what I'll do because I

8:38 don't like doing things one at a time i

8:40 want to do it in bulk i want the quick

8:42 um I want to get it done fast um I can

8:44 do that in bulk assign it and now all of

8:47 a sudden here in just a few clicks we've

8:49 created a new load balancing policy i

8:51 didn't have to be the expert in knowing

8:53 what are what are all the settings to

8:55 know do I you know what is an activity

8:57 timer it was already there for me I just

9:00 made a couple selections and I'm now off

9:03 and running

9:05 I have a question uh it looks like on

9:08 every leave and every spine switch you

9:10 can set different load balancing

9:11 policies yes is there a reason why you

9:14 want to do that in the same network you

9:16 have different load balancing policies

9:18 probably

9:19 not wouldn't advice that um you know

9:23 some of them like GB maybe you want um

9:25 different at the spine versus the leaf

9:27 um there's there's a little difference

9:29 there that you might uh want there but

9:31 in in general yes you wouldn't probably

9:33 want to um so that's where the bolt

9:35 comes in bang is you you just want

9:36 everything to have the same load

9:37 balancing policy um but if you really

9:39 wanted to get crazy you you could but I

9:42 wouldn't advise it okay thanks uh Jack

9:46 Pauler Paradigm Technica speaking of

9:48 getting crazy yes is everything exposed

9:51 through this interface or is there still

9:52 stuff that you have to drop down into

9:54 command line and tweak and munch and

9:56 stuff like that for for things like DLB

9:59 and GB no that's it's all configured

10:01 there we saw there was a big list of

10:02 items you could manually configure so So

10:05 for those no you don't um if you wanted

10:07 to use like the we talked about just a

10:09 second ago like the RDMA load balancing

10:11 we don't have that yet modeled yet

10:13 that's coming soon so if you wanted to

10:14 use that today you would then have to

10:16 drop into what we call configlets to to

10:18 help set that up but the the the the

10:21 point is you're going to capture

10:23 everything that you possibly can in this

10:25 tool yes and stray stay away from Yes

10:30 old style stuff yeah the goal is to stay

10:32 away from the CLI cli okay we can still

10:34 show you as I talked about in an earlier

10:36 part the the rendered config if you

10:38 still love to see it and you want to see

10:39 you know did it match exactly what I

10:40 thought um but but the idea is is yes

10:43 you will drive everything through the UI

10:45 or like we talked about um earlier APIs

10:48 via like rest terraform anible things

10:50 like that you can drive it all through

10:51 there if you wanted to to help automate

10:53 that right okay thank you

10:58 yeah so let me go back to the to go back

11:02 to the slides so to sum up we saw how we

11:05 can manage this network at scale from

11:09 from deploying it and it's the same

11:11 process no matter if you have one GPU 10

11:13 GPUs 100 GPUs thousand GPUs a million

11:16 GPUs no matter what it is it's the same

11:18 process um it wouldn't take me any

11:20 longer if it was you know million GPUs

11:21 because I could do that bulkedit thing

11:22 for everything and and have it all set

11:25 up so we can take that complexity remove

11:28 that that burden of complexity out there

11:30 give you the expert to help you deploy

11:33 your AI data centers faster so any last

11:37 minute questions before we go to the Yes

11:39 oh wait before we go to I wonder if the

11:41 next section's better to ask it or

11:43 you're done after this i'm I'm finishing

11:45 now we're going to be moving on to um

11:47 kind of the day two the day-to-day

11:49 operations and we'll see a lot about the

11:51 the visibility we get and and monitoring

11:53 the networks heat maps all that fun

11:55 stuff okay so my question is how is this

11:58 delivered to customers how do they get

12:00 what

12:01 is how is it consumed so how much of it

12:04 is through SI how much get it from u

12:08 MSPs how do they get it directly from

12:11 you and how much of it is on the

12:13 complete solution I'm just curious how

12:15 it how it gets to them and how they

12:17 consume it once it gets to them yes yes

12:20 great question so thank you for asking

12:22 um so Abstra is delivered as a virtual

12:26 machine so on juniper.net net in our

12:29 downloads page we have OVAs KVMs um

12:32 Microsoft HyperV um we're looking at

12:34 adding Nanix versions as well um because

12:37 that's getting popular now um so it is

12:39 deployed as a virtual machine so almost

12:42 all of our our users and customers they

12:45 get that OVA they deploy it inside their

12:47 their network inside their data center

12:49 in the management plane so it has that

12:51 management connectivity to all of the

12:53 managed switches um so that's typically

12:55 that's deployed there's there's some

12:57 cases where um you talked about like SIS

12:59 or MSPs as part of the bigger solution

13:02 will come into your data center yeah and

13:03 they'll they'll help deploy it for you

13:06 um you know same thing like professional

13:07 services could do to come in and help

13:09 you know deploy it for you um but it is

13:11 traditionally um delivered that way as

13:13 an on-prem application um we do have as

13:17 I talked about the very beginning in the

13:19 previous session was the integration

13:21 with with Mist AI um so we do have a

13:24 component that we can take a lot of this

13:26 data and a lot of data we'll see in in

13:28 the upcoming session and be able to

13:30 build AI ops on top of it so um AI for

13:33 networking um where we can use AI and ML

13:36 to help improve network operations um so

13:39 we'll be talking about that probably in

13:41 in other sessions because today was

13:42 focused on the the building of the infra

13:44 but we can absolutely do that to be able

13:46 to bring more AI and insights and be

13:49 able to help you troubleshoot faster so

13:51 more direct sales versus going through

13:52 partners well we still have you know

13:54 partner channels um but even through a

13:56 partner channel they'll still get the

13:58 software and deploy it you know on

13:59 premises okay thanks

14:03 when do we expect 60 to be released

14:07 thank you for asking should be in the

14:09 next couple weeks excellent so we are in

14:11 actively in early trials with several

14:13 customers that are actually using this

14:15 and deploying it in their own AI and ML

14:17 trading jobs as we speak got it but

14:20 it'll go GA here in in about next couple

14:22 weeks

Day 1: Managing your AI data center at scale with Juniper Networks

GPYOU: Day 1 - Managing Your AI Data Center at Scale

You’ll learn

Who is this for?

Resources

Experience More

Securing AI Clusters, Juniper’s Approach to Threat Protection with Juniper Networks

Day 0: Designing your AI data center with Juniper Networks

Day 2: Operating your AI data center with Juniper Networks

GPYOU: Building and Operating your AI Infrastructure with Juniper Networks

AI Unbound, Your Data Center Your Way with Juniper Networks

Maximize AI Cluster Performance using Juniper Self-Optimizing Ethernet with Juniper Networks

Transcript