AI Data Center Networking Using Open Ethernet
The many advantages of open ethernet networking for AI data centers
What makes AI networking so challenging, and how can we improve our approach? Juniper’s Raj Yavatkar shares his fascinating ideas about the advantages of open ethernet networking for AI data centers.
Learn more about AI data center networking.
You’ll learn
Why ethernet is a good fit for AI data center networking
How Juniper Apstra® software can simplify AI data center operations
Who is this for?
Host
Experience More
Transcript
0:04 so in a recent uh blog our CEO Rami
0:07 Rahim um really outlined why the AIML
0:12 workloads and their needs are
0:13 transforming high performance
0:16 networking the reason for that is that
0:18 if you look at the machine learning
0:19 workflows they fall into two categories
0:22 training and inferencing training
0:24 workloads are the most demanding because
0:26 you have thousands of gpus communicate
0:30 constantly constantly with each other
0:32 exchanging data so that they can train a
0:34 model and for that they need very high
0:37 performance High through food and almost
0:39 lossless operation to achieve that you
0:42 have to really crank up the networking
0:44 the second type of workload is
0:46 inferencing that requires very low
0:48 latency so you need to also focus on low
0:51 latency
0:56 networking the ethernet just like
0:58 Internet Protocol has been one of the
1:00 most open and successful standards
1:03 networking ethernet I worked on it for
1:05 many years has evolved to suit new
1:08 workloads new use cases again and again
1:12 and it has a very large ecosystem of
1:15 both vendors supplying products but also
1:17 expertise within the networking industry
1:20 now if you look at the needs of the AIML
1:23 training workloads of inferencing
1:25 workloads that I mentioned those can be
1:28 satisfied by the ethernet ethernet comes
1:30 with lots of bels and wh whistles or
1:32 features that can be used and tuned for
1:36 the to meet the needs of aaml
1:42 workloads and where Juniper comes in is
1:44 that we have been shipping products
1:46 based on open ethernet standard for
1:48 quite some time today we are shipping
1:49 our PTX spine switches and qfx switches
1:54 for these workloads along with what we
1:57 call fabric automation software collaps
1:59 ST which allows you to orchestrate the
2:02 end to end path from one GP to another
2:05 to achieve the high performance
2:09 n yeah so I think ethernet as I said is
2:12 a open standard for a long time and it
2:15 has constantly evolve and that's the
2:17 biggest strength the open ecosystem and
2:20 if you look at the standards bodies such
2:22 as I Metro ethanate and now there's a
2:25 new standards body be created called
2:27 Ultra ethanate Consortium just to
2:30 address the needs of high performance
2:32 and AIML networking needs these bodies
2:35 have made sure that ethernet continues
2:37 to stay open it has a robust ecosystem
2:40 that can meet the use cases and their
2:42 needs so I have every confidence to feel
2:45 that we continue to use ethernet as the
2:49 only open standard to meet the needs of
2:51 the
2:54 networking I think it's not enough to
2:57 have a high performance networking based
2:58 on ethernet uh you need to also the
3:02 orchestration of different network paths
3:04 congestion management flow control and
3:06 junipa networks has invested in a
3:08 completely new fabric automation
3:10 software called abstra that allows you
3:13 to orchestrate and automate the fabric
3:15 operation so what we do is we can not
3:18 only configure and set up the paths
3:21 necessary with necessary hooks uh to
3:23 take advantage of the ethernet features
3:26 but we also continue to monitor it
3:28 during its oper ations with Telemetry so
3:31 that we can build a Clos Loop automation
3:34 to make adjustments dynamically as the
3:37 network conditions or traffic conditions
3:39 evolve that's a big plus and very
3:41 important part of U high performance
3:46 networking so I think we are already
3:49 shipping 400 gig uh qfx PTX which is we
3:52 are on the CP of releasing our 800 gig
3:55 and we're going to continue to evolve to
3:56 the Next Generation to go from 800 gig
3:59 to 1.2 1.6 terabyte kind of speeds so
4:03 speeds and speeds and radius or radics
4:06 of the switches will continue to evolve
4:09 so that we can pack together in a small
4:13 cluster even a rack full of gpus lots of
4:16 high performance networking all based on
4:18 ethernet so it can be sourced from
4:20 multiple vendors using a robust
4:22 ecosystem that's where we are going
4:24 couple that with a robust automation
4:26 software you can really reduce the
4:28 operational cost and total cost of
4:30 operations for supporting such really
4:34 demanding
4:43 workloads