Bob Friday Talks: From protein folding to teaching Marvis about Zoom

0:01 hello I'm Bob Friday and thanks for

0:03 joining us on another episode of Bob

0:05 Friday talks I am joined today by former

0:08 professor of structure biology at Lon

0:10 University and now director of data

0:12 science here at Juniper Miss naaj panu

0:16 today we're going to be discussing his

0:17 journey from professorship to network

0:20 data scientists welcome naaj you know

0:23 maybe we can start our conversation

0:24 today a little bit about how did you you

0:26 know what got you in be a professor I

0:30 think what got me into being a professor

0:32 but also the transition to from

0:35 structural biology to networking is my

0:37 love of mathematics that's a common link

0:40 I love Applied Mathematics I love

0:42 solving problems with mathematics

0:44 testing out the problems by writing

0:47 software and then validating it okay now

0:50 now structured biology you know exactly

0:52 what does that mean so you were working

0:54 on so what we were trying to do was

0:59 developing algorithms to solve the

1:01 three-dimensional Atomic coordinates of

1:04 protein or DNA molecules so why is that

1:07 important essentially they are the

1:10 building blocks of life with that

1:12 knowledge of the three-dimensional

1:14 structure we can understand how the

1:16 protein functions and the basis for

1:18 diseases so many diseases are caused by

1:21 mutations in that protein with that

1:23 structure we can correlate to the

1:25 function and then develop drugs to

1:28 combat disease Okay so and I've read a

1:30 couple of your papers it sounds like you

1:32 know you know before joining us you're

1:33 working on solving the problem of

1:35 immortality aging you know I almost

1:38 regret hiring you you know well know so

1:42 tell me exactly how the proteins aging

1:44 all relate to each other so at the

1:47 fundamental level aging or cancer is

1:51 caused by damage in DNA or can be caused

1:55 through damage in DNA we our body and

1:58 has a whole system

2:00 of DNA repair proteins that repair

2:03 protein repair damage DNA all the time

2:07 so that these mutations aren't

2:09 propagated to have malfunctioning

2:11 proteins and as we get older or if we're

2:15 exposed

2:16 to radiation for example Sun the DNA

2:20 damages more and hence we have these

2:23 problems so I one of the uh aspects I

2:27 was working on in the Netherlands was in

2:30 collaboration with experts trying to

2:33 determine these DNA repair proteins and

2:36 understanding the mechanism of action

2:38 okay so I've got to ask you now so

2:41 you're working on networking trying to

2:42 solve networking problems and Building

2:44 Solutions on par with domain experts how

2:47 close were you to solving this aging

2:49 problem oh I think I was more into uh

2:53 solving the particular DNA repair

2:55 Pathways associated with certain

2:57 diseases the Aging problem is still

3:00 ongoing trying to understand how we can

3:03 reverse aging it's an active area of

3:05 research at the moment and but with more

3:08 knowledge that we have on all the

3:10 different Pathways and how you know our

3:14 DNA can become damag the more we'll get

3:17 to solving this problem okay I don't

3:19 want to feel too bad about hiring you

3:21 like you know solving aging that's a

3:22 pretty critical problem to be solve now

3:24 I understand also that you know when

3:26 you're working on this problem really AI

3:29 wasn't around at the time maybe describe

3:32 a little bit to the audience you know

3:33 what techniques I know you were visiting

3:35 linear accelerators and stuff whatever

3:37 you know when the hell did you need to

3:39 go to linear accelerator for so we

3:42 employed statistical methods in order to

3:45 solve uh the structures of proteins so

3:48 what we had to do first was collect the

3:50 experimental data so we first grew

3:54 crystals of the protein of interest and

3:56 then we would take it to for example

3:59 synchrotron

4:00 at that synchrotrons we had access to

4:04 highly intense x-ray beams with that we

4:07 could collect a defraction pattern that

4:09 defraction pattern gave us the

4:11 information we needed to solve the

4:14 three-dimensional structure of

4:16 proteins and through statistical methods

4:19 we are then able to determine the atomic

4:22 coordinates and um analyze the molecules

4:26 okay so you were building a big

4:28 simulator you're were trying to build

4:29 simulation to simulate how all these

4:31 proteins and amino acids were folding

4:33 was that the basic technique so indeed

4:36 we developed a model in order to

4:39 determine the atomic coordinates but it

4:41 was based on the experimental data but

4:43 then as I moved to the Bay Area a

4:47 transition occurred and that was

4:49 actually using AI methods So based on

4:53 the hundreds of thousands of protein

4:55 structures solved Deep Mind and Google

4:59 got involved to try to solve destruction

5:02 try try to solve structures without the

5:05 need to go to the synchrotron to collect

5:08 the experimental data so they trained a

5:10 model to try to predict the fold of a

5:13 protein molecule and they were very

5:15 successful at doing that okay so this is

5:17 the transition when you decided to

5:19 transition from academic to back into

5:21 industry and data science you know maybe

5:25 a little bit more about that transition

5:27 and you know is this a case where AI

5:29 actually put you out of business you

5:30 know you how many years were you working

5:32 on this protein problem I had started as

5:34 an undergraduate in

5:37 1996 developing algorithms and applying

5:40 them to protein structures um I was very

5:45 interested in AI yet I realized in order

5:48 to do it and understand AI methods

5:52 coming to the Bay Area would be the best

5:55 solution for me so indeed it was a huge

5:57 transition going from academic to an

6:00 industrial environment but at the end it

6:02 was all about the mathematics and

6:05 another Advantage coming to the Bay Area

6:08 you have access to a tremendous amount

6:11 of data with that data for example what

6:14 we have at Juniper Mist we can tackle

6:17 big problems and model the situation

6:20 okay so so the the team at Google deep

6:22 mine actually were able to solve this

6:24 problem with some sort of deep learning

6:26 AI technique and eliminating the need

6:28 for the simulator the models yes you

6:30 know can you give the audience a little

6:32 bit about you know do you how much do

6:34 you know about deep mine and what type

6:36 of models were they using to you know

6:38 solve this protein problem similar to

6:40 what we see with chat gbt or these

6:42 different type of models the models are

6:44 not Transformer based but they are deep

6:47 learning they um so in the case of GPT

6:51 we have billions of data parameters for

6:54 the case of Deep Mind and their deep

6:57 learning model they have access to maybe

7:00 100,000 but they probably truncated

7:02 their training data to a bit less but

7:05 there's a great deal of information

7:07 within each of those structures and they

7:09 were able to develop a Target function

7:12 that um understood the protein folding

7:15 problem and then validate the model

7:18 since we they had access to all this

7:20 data so this was a revolution because

7:23 previously as you mentioned most people

7:25 were just trying to they considered it a

7:27 big optimization problem they weren't

7:29 looking at the vast amount of data

7:31 available and they were just thinking oh

7:33 if I optimize this problem and I search

7:35 All Through parameter space maybe I

7:38 could get to the answer they ignored all

7:40 the data that my colleagues my fellow

7:43 structural biologists had and that was a

7:46 difference that deep mind okay so so now

7:49 you're at Juniper miss you know you're

7:52 working on trying to build a solution

7:54 that can manage operate networks on par

7:57 with human domain experts you know you

8:00 know one minute you're trying to solve

8:02 aging proteins now we're overall trying

8:04 to solve build something on par of human

8:07 it domain experts similarities proteins

8:10 networking I think the key similarity is

8:14 just the scientific approach you know

8:17 you have to define a problem you have to

8:20 know the data that you have and the

8:22 limitations so in the case of juniper

8:24 Mist we had access to both zoom and

8:28 teams data

8:29 this gave us an indication of the

8:31 quality of the data but thanks to your

8:35 efforts in part we had all this data in

8:38 the cloud that we can combine with that

8:40 data so we can determine which network

8:43 parameters or client parameters are

8:45 affecting Zoom calls so we have

8:47 developed a model to do this an AI based

8:51 model and because we have access to

8:54 millions of data we're able to then

8:57 train a very accurate model and then

9:00 more importantly validate that model you

9:03 know I make a barrel of wine and you

9:04 know what I've learned is that organic

9:07 chemistry is a hell of a lot harder than

9:09 Wireless you know trying to make a good

9:11 bottle of wine wireless wireless is much

9:13 easier you know what do you think

9:15 solving the protein problem solving the

9:17 networking problem same level easier or

9:20 harder that's a great question by the

9:22 way the best approach is just to

9:25 iteratively improve those methods so at

9:28 the start of the protein folding problem

9:30 people as you mentioned were just trying

9:32 to simulate the model right they weren't

9:36 using the vast amount of data but then

9:38 they learned to use all the data um

9:42 there's still some networking problems

9:44 that exist that we will tackle in the

9:47 future we started off with getting this

9:50 label data from zoom and teams we had a

9:53 well-defined problem we had a huge data

9:56 set and then we could try to explain the

10:00 issues now the powerful thing we have

10:03 all of this expertise at Juniper mist

10:06 and they could validate that problem as

10:08 we validated whether we came up with the

10:10 correct solution or not and that's I

10:13 think the key we have the endtoend

10:16 process all the way up to validation so

10:20 I realize I might not have answered your

10:22 problem completely like what is more

10:25 difficult but you know this I think is a

10:28 big first step to having AI methods in

10:32 networking including like the post

10:35 connection problem which we can then

10:37 create even more problems in general

10:40 problems and maybe on the journey give

10:43 the audience a little bit you know you

10:45 know in terms of building a model that

10:46 can accurately predict zooming teams you

10:49 know where are you on the journey you

10:51 know can you actually accurately predict

10:52 a zoom teams user experience now or are

10:55 we still working on that we are able to

10:57 accurately predict a

11:00 zoom um client minute so for example

11:04 what do I mean by that if a client were

11:06 to run have a zoom call we can predict

11:10 for example the latency or the packet

11:13 loss that client is um experiencing or

11:17 will experience how do I know that based

11:19 on the metrics based on our validation

11:22 training data set and but what's but so

11:26 what so why what's the big deal about

11:28 predic latency and packet loss the nice

11:32 thing that comes along with AI is this

11:35 whole idea of explainability once we

11:37 have an accurate model which we know

11:39 from our statistics we can try to

11:41 explain that and that is what our

11:43 customers really like they like looking

11:46 at the feature ranking so what are the

11:48 important Network parameters so for

11:50 example is it a capacity issue is it a

11:53 coverage issue or is it a client issue

11:56 that's what our customers want and

11:58 that's uh what we're providing them so

12:01 and we validated it not only internally

12:04 but externally as well now now I've

12:06 heard you talk about shapley and you

12:08 know maybe for the data scientists in

12:10 our audience here maybe give a little

12:12 bit explanation you know Mutual

12:13 information shapley how does this

12:15 shapley help you explain the network

12:18 features responsible for the prediction

12:20 so at the core of shapley is it

12:24 takes I have this concept that I've

12:26 mentioned a few times and let me Define

12:28 it I call it a feat that's what's called

12:30 in what's used in AI what this is is

12:32 essentially a network parameter we are

12:35 using to try to predict the Model

12:37 Behavior so some example of features

12:39 that we're using are the client RSSI or

12:43 the number of clients on the AP or you

12:46 know the return trip time from a

12:49 particular AP so shapley considers every

12:53 single combination of all of those

12:55 features in order to explain which one

12:59 is dominant and that's what we want to

13:01 know which feature is dominant that's

13:04 leading to this poor Network performance

13:07 and the mathematically proven fact is

13:11 that shapley values are objective and

13:14 unbiased so we know that it can provide

13:18 this information so in contrast Mutual

13:22 information it only looks at pairwise

13:25 comparisons so you can only compare the

13:28 for example the latency from Zoom with

13:31 that particular feature you can compare

13:33 at a multivariant level okay so so it

13:36 sound like we have a we have a model we

13:38 have shapley you know maybe give the

13:40 audience a little bit how much dat how

13:42 much data did it take to train this

13:43 model we talking like terabyte is this

13:46 like chaty PT where you took the whole

13:48 internet and trained this model on so at

13:50 the moment what goes into training a

13:52 model is usually a million data points

13:55 so what I mean by a data point is that

13:58 we have the labeled latency that we're

14:02 getting from either Zoom or teams and

14:05 then we are combining that data with all

14:07 of our Juniper misted um network data so

14:11 combining that all we train the model

14:14 and we use that model to then predict

14:17 what our clients are uh experiencing you

14:21 know maybe for the audience you know an

14:23 example of how this is really helping

14:25 customers or you know the needle and the

14:27 Hast stack problem that it teams face

14:29 so we've uh uncovered a few issues but

14:32 one of the thing that we've seen that

14:34 has been quite prevalent is the idea of

14:37 a customer connected to a VPN server so

14:41 before both zoom and Microsoft teams

14:44 have identified that a VPN server can be

14:48 deleterious to a call but something that

14:51 we have discovered is that it's actually

14:53 not just the VPN server that is an issue

14:57 but actually the distance from the

14:59 client to the VPN server and if there is

15:03 a large distance say someone connecting

15:06 from India go connecting to Australia

15:10 that leads to a very poor uh video or

15:13 audio performance okay now R well I have

15:15 to say I'm starting to feel a little bad

15:17 you know solving the Aging problem

15:19 solving the network problem still not

15:20 sure I made the right decision you know

15:22 but maybe for other young professors and

15:25 data science coming out any words of

15:27 wisdom and advice on there Venture into

15:29 the data science AI world for me at

15:33 least from my experience as I started

15:35 off with I loved mathematics I found the

15:38 area that I really enjoyed doing and I

15:42 found the place that um I've always been

15:45 lucky that I found a place where I can

15:48 do this and apply it to problems which I

15:50 think are very critical to me so that's

15:54 the number one thing if you have that

15:57 passion you will be able to find

15:59 something that you enjoy doing you know

16:02 whether it's structural biology where

16:04 there's still a lot of room for

16:06 improvements whether it's networking or

16:09 you know even natural language

16:11 processing developing something like GPT

16:14 there's so many areas of AI so for I

16:19 love for example thinking about ethical

16:21 AI thinking about um the hallucination

16:25 problems of GPT how can we get around

16:27 that I mean these are all open areas and

16:30 you know there will always be room and

16:33 always be uh just keep in mind I think

16:37 the guiding principles are the

16:39 scientific uh method of

16:42 experimentation testing and validation

16:45 and then reiterating well now Raj I want

16:48 to thank you it's been a pleasure

16:49 working with you I can say and I want to

16:51 thank everyone for joining us today on

16:53 Bob Friday talks and look forward to

16:55 seeing you on the next episode

Bob Friday Talks: From protein folding to teaching Marvis about Zoom

From protein folding to teaching Marvis about Zoom

You’ll learn

Who is this for?

Host

Guest speakers

Resources

Experience More

Bob Friday Talks: What place does AI have in education and film?

Bob Friday Talks: Introducing Networking for AI

Transcript