Network as Code Advanced Topic, Part 3 of 3
Advanced concepts of network as code
Are you looking for a master class on network as code? Watch the third video in this three-part series, which covers advanced concepts in network automation and is packed with information you can use to effectively scale your operations and redirect staff time to solve harder problems.
You’ll learn
Best practices for network testing and validation
How to utilize automated advanced monitoring
Key security policy and compliance considerations
Who is this for?
Host
Guest speakers
Experience More
Transcript
Introduction
0:02 foreign
0:08 you folks are Troopers you've stayed
0:11 through the first two videos in a series
0:14 where we introduced the general topic
0:17 network is code video two we talked
0:20 through implementing Network as codes
0:22 some of the just uh considerations for
0:26 when you're starting a project now we're
0:29 going to talk about some of the advanced
0:31 stuff some of the gatches some of the
0:33 things that you really need to consider
0:36 we spent quite a bit of time on this in
0:38 the last video Ned and by the way thank
0:41 you Juniper for sponsoring the series
0:44 but now we spent quite a bit of time
0:46 talking about this near in the video and
0:48 it's a very important topic Network
0:50 testing and validation what what is that
0:54 and what are we looking to achieve
0:57 right well I think there's multiple
0:59 levels of testing you can do when it
1:01 comes to deploying your network as code
1:03 and it all starts with the initial
1:05 check-in of that code too whatever your
1:07 Source Control process is typically
1:09 that's going to kick off some sort of
1:11 integration Pipeline and it's the job of
1:14 that pipeline to do some very basic
1:16 checking of what you've submitted is it
1:19 formatted properly is the code
1:21 syntactically valid and makes sense and
1:24 maybe you're even testing for some best
1:27 practices by using static code analysis
1:30 tools and we'll touch on specifics later
1:32 but the general idea behind a static
1:34 code analysis is it doesn't try to run
1:36 the code it's simply looking at the code
1:38 itself and it has some rules it's
1:41 basically a rules engine that says oh
1:43 you you're opening up you know Port 22
1:46 to the entire world through this
1:47 firewall Rule and that's we don't do
1:49 that so you know might flag that as as a
1:53 not good configuration so you're going
1:55 to have some sort of static analysis
1:57 tool in there and that's all about
1:59 vetting the code just to make sure
2:01 before it even runs that it looks good
2:04 so so Ned as you're talking through this
2:07 I can't think of a single product that
2:11 kind of does all of this and as I'm
2:14 streaming together the test process I
2:18 can't hold I can't help but think am I
2:23 introducing another point of failure
2:26 because now there's another human uh
2:30 uh
2:31 uh doing with their meat hands stringing
2:34 together these tests isn't that another
2:37 you know potential Breaking Point
2:39 of course you're having more complexity
2:42 that's that's what we do as Engineers
2:44 right we add complexity I would say the
2:47 benefit here and and the good news is
2:50 that when you're building your CI CD
2:52 pipelines you can do that declaratively
2:54 with code so it's not necessarily
2:58 somebody sitting there and stitching all
3:01 these pieces together by dragging boxes
3:03 around the screen or anything like that
3:05 you can develop workflows
3:08 and standardize those workflows and
3:11 publish them for other folks to consume
3:13 and use and actually a lot of the vendor
3:16 platforms have that uh process baked
3:19 into them so they already have these
3:22 predefined templates or these predefined
3:24 workflows that you can take advantage of
3:26 and then kind of snap in your own
3:29 tooling and tool sets so you still are
3:32 going to have humans involved because
3:33 you're always going to have humans
3:35 involved at least I hope otherwise we're
3:36 out of a job but those humans are going
3:39 to be doing things that forward your
3:43 automation goal and they'll be doing it
3:46 in code not by manually clicking around
3:48 in a UI so what if this caused this
3:51 testing this code what what's what's the
3:54 name is it's just is this all just part
3:56 of the sunset of infrastructure is cold
3:58 I think so so we're borrowing a little
4:01 bit from the concepts that are behind
4:03 testing software and so software has a
4:06 whole bunch of different test types you
4:08 have things like unit testing which is
4:10 just testing that say a function
4:12 Returns the values you expect and errors
4:15 in the way that you want it to so that's
4:17 you know at the very essential I've got
4:18 a unit and I'm testing it and then you
4:21 get into integration testing okay how
4:23 does that function work with the rest of
4:24 my program as a whole
4:27 not all of these Concepts apply one to
4:30 one or map one to one two infrastructure
4:32 and to networking because they are
4:34 different in the way that they work so
4:37 we can try to apply some of these
4:39 Concepts while recognizing that not
4:41 everything is a perfect mapping from
4:44 software development to managing
4:46 infrastructure
4:47 so talk to me about integration
4:50 testing like a a continuous motion when
4:54 I was building it wasn't quite a green
4:58 field but it was greenish we had a
5:00 chance to kind of reset a massive
5:03 Network for a Fortune 100 this this
5:05 opportunity doesn't happen often we
5:10 and we were bringing in the application
5:13 it was a mission critical application
5:15 was my job to manage that mission
5:17 critical application and they built all
5:20 of this network redundancy and I wanted
5:23 to say hey
5:25 let's take the opportunity to turn off
5:28 that switch like it's in theory like we
5:30 don't get to do this in in the real
5:32 world in production right doing tests I
5:34 can say hey let's actually test the
5:36 redundancy turn off the switch and let's
5:38 see if we can still do a transaction
5:42 that's Nirvana but how do we gradually
5:45 get there how do we kind of integrate
5:48 merge these pipelines sure yeah so I
5:51 mean the testing begins with when code
5:54 is checked in right that's that's the
5:55 beginning and we're going to do some
5:56 basic tests to make sure that your code
5:58 is good
5:59 but then the next step is how does that
6:01 integrate with the existing system and
6:03 that can be really difficult to test
6:05 because you don't necessarily want to
6:07 apply changes live to your network so
6:10 you could potentially have a development
6:12 instance of some of your network maybe
6:15 it's a virtualized version of your
6:17 network where you can deploy those
6:19 changes and then have a series of checks
6:22 that just validate hey the configuration
6:24 loaded properly on the switch it didn't
6:26 it didn't barf on any of the commands or
6:28 or the instructions of the configuration
6:30 that's in there it's a valid config that
6:32 will actually load on that switch even
6:35 though it's a virtualized version of
6:36 that switch and then there's another
6:38 aspect of integration which is how does
6:41 the network function as a whole once
6:44 you've deployed your updates and that
6:46 can be very very difficult to test in uh
6:49 in a non-production environment so there
6:53 are certainly tools out there that will
6:56 attempt to make a digital wins sort of
7:00 of your existing environments apply the
7:02 changes there and review the results but
7:05 ultimately you're always testing in
7:07 production right eventually it has to
7:09 hit
7:10 your production Network and you're
7:12 essentially testing there so what's
7:14 really really critical in this whole
7:16 process is having a complete feedback
7:18 loop to capture what's happening in the
7:21 production environment and have that
7:23 inform your development process for the
7:26 next iteration of your code
7:29 so let's move on to another topic
7:32 monitoring how does Network as code
7:35 impact monitoring and analyst analytics
7:39 seems like there's opportunities there
7:41 but what are some of the advanced things
7:43 we can start thinking about once we move
7:45 to network is code
7:47 well certainly assuming that you're
7:49 monitoring an analysis analytics
7:51 software packages and devices support it
7:53 you can deploy them with network as code
7:56 so it seems like we there's there's a
7:59 theme I'm developing here I'm in hearing
8:01 a friend of mine likes to call it
8:03 everything is code right if it if it has
8:05 an API endpoint and you can program
8:08 against this you should be defining its
8:11 configuration using Code as much as
8:13 possible and I think if there are any
8:14 sres watching you know exactly what I'm
8:17 talking about that's the the Nirvana of
8:19 the SRE is to automate all the things
8:21 that can be automated so you can move on
8:24 and do something else
8:25 uh so that's that's a portion of it is
8:27 just setting up that initial monitoring
8:29 and analytics it's something that some
8:31 people forget to do when they set up the
8:33 switch or they set up the router they
8:34 forget to turn on proper monitoring or
8:36 get it integrated with that monitoring
8:38 package that you have somewhere else oh
8:40 I set up a new Switch but I forgot to
8:42 send the ticket to the monitoring team
8:43 to add it to their list of network
8:45 devices that sort of thing when you have
8:48 that monitoring portion defined using
8:51 network as code you don't have to
8:53 remember because now you've created an
8:55 integration where a new network device
8:58 is added it automatically gets
9:00 integrated into your existing analytics
9:02 and monitoring packages so it's that
9:04 Dynamic Discovery and integration so
9:06 that's certainly a partial portion of it
9:08 but I think another important portion of
9:11 it is the ability to capture the impact
9:14 of your changes
9:15 and that kind of gets back to what we
9:17 were just talking about like I deployed
9:19 my code did I break anything
9:22 that's certainly important but and even
9:24 another and possibly equally important
9:26 part is
9:28 what were the actual impact of my
9:31 changes and did I achieve the goal of
9:34 those changes to begin with because we
9:36 don't just change the network for
9:38 funsies right it's not you know Friday
9:40 we're going to deploy some new network
9:42 as code and then go out and have happy
9:43 hour and like you have a business reason
9:46 or a technology reason to deploy changes
9:49 to the network
9:50 and so defining what the
9:54 point of the change is and then figuring
9:57 out how to measure the impact of that
9:59 change to make sure that the change you
10:01 made actually is reflected in the
10:04 performance that's the job of monitoring
10:06 analytics oh hey you made this update to
10:09 the network and now customer requests
10:11 are coming in 50 faster than they were
10:14 before because you streamlined something
10:16 in the network that's awesome you get to
10:18 report that back to your boss I improve
10:20 the network performance so that you're
10:23 getting more customer orders per second
10:25 fantastic
10:27 so as I think of you know the cicd
10:30 process and things that we wish we could
10:32 do but we didn't have the people or
10:34 processes to do it uh it was too
10:37 expensive to do or too burdensome to do
10:40 it every time you know we could create
10:43 uh CID CI CD processes or pipelines that
10:48 would kick off specific monitoring for a
10:51 specific amount of time off a set of of
10:55 ports let's say you know Port mirroring
10:58 generally speaking is expensive from a
11:01 uh is expensive from a resource
11:04 perspective but after a certain change
11:07 we want to always mirror a port for
11:09 let's say two hours so we collect that
11:12 data and if there's a there's another
11:14 trigger from the monitoring tool that
11:16 says hey if we reach this threshold
11:20 take this action and this action may not
11:22 be disruptive like making configuration
11:24 changes it could be monitor this other
11:27 thing uh that is you know kind of this
11:30 limited resource that we can now put on
11:33 to collect more data and make better
11:35 informed decisions
11:38 yeah absolutely and at this point I
11:40 won't say like CPU time is cheap but
11:42 it's a lot cheaper than it used to be
11:43 right storage isn't cheap but it's a lot
11:45 cheaper than it used to be so the
11:48 ability to capture
11:50 all of this information is certainly
11:52 there the other big challenge is then
11:55 okay I got all this additional info how
11:58 do I analyze it how do I munge useful
12:01 information out of it and so that's
12:03 that's not really a network is code
12:05 challenge but it's something that's
12:07 going to feed back into the loop of your
12:09 development of network as code is having
12:11 some sort of data analysis tool that can
12:15 give you useful insights into the
12:17 information that you're Gathering
12:20 so let's talk about our last
12:22 Topic in this series I think
12:25 one of uh if you're a networking person
12:27 you've dealt with both sides of this
12:30 implementing your security policy via
12:33 the network and then ensuring uh uh just
12:37 proving that to some internal or
12:40 external audience so let's talk about
12:43 implementing security policies through
12:46 code I talked to a bunch of folks about
12:49 security is cold that's a thing
12:53 how do we where do we start with our
12:56 security policies through code
12:59 sure so there's a whole bunch of
13:00 different policy engines out there that
13:03 will analyze code compare it to some set
13:06 of policies and then give you the
13:08 results one of the most popular ones
13:10 that I've been working with for a little
13:12 while is called open policy agent or
13:14 oppa and that has the capability to
13:17 analyze anything that that is expressed
13:20 in Json
13:21 and compare it to some rule sets that
13:24 you've defined and then give you results
13:26 based off those rule sets and what can
13:29 you express with Json well almost just
13:32 about anything so you know whether
13:34 that's uh doing analysis of static code
13:37 analysis so just what does the code look
13:39 like uh or it could be I have a planned
13:43 set of changes that I want to apply to
13:46 my network and I can look through the
13:48 plan set of changes and make
13:49 determinations of whether or not I find
13:52 that it's secure all the way up to
13:54 analyzing the actual running
13:56 configuration on network devices or
13:58 servers as long as it can be expressed
14:01 through Json oppa can take a look at
14:04 that and make some policy decisions say
14:07 oh well someone went into this switch
14:09 after the fact and altered something and
14:12 it's no longer in compliance and that
14:14 compliance can be defined usually
14:16 through the security and compliance
14:18 teams in your organization they set the
14:20 policies and then they allow you to test
14:22 for whether or not you're in compliance
14:24 with those policies
14:27 and one of the things that frustrated me
14:30 to know in when I did Network
14:32 Administration and operations day to day
14:35 in large organizations is when the
14:38 dreaded auditor comes in
14:41 and I think I wanted to talk about two
14:44 topics within this sure one
14:48 how do I answer
14:50 uh the requests from Auditors when I'm
14:53 living in an infrastructure is code and
14:56 networking code environment because I'm
14:58 not in my mind going back to the
15:00 individuals which is pulling configs
15:03 going to backups Etc to in answer the
15:08 requests from the Auditors and then the
15:12 second one is how do I make how do I
15:16 help the Auditors Trust
15:18 those
15:20 those artifacts I'm giving to them as
15:23 proof so let's do the first one first
15:25 like you know how am I pulling the
15:28 request the the the
15:30 a sample request is show me that uh uh
15:35 authentication is configured on every
15:36 Network device
15:38 sure yeah and I mean that's a request
15:40 pretty common request that comes in now
15:43 let's assume that you've defined in your
15:45 network as code authentication policies
15:48 for every single Network device
15:50 all you need to do is run a drift
15:55 detection essentially against all your
15:58 existing network devices and that gets
16:00 back to the get set and test that we
16:02 talked about in the previous video
16:03 you're just basically running the get
16:06 and test portions of that get me the
16:09 configuration from every network switch
16:10 test it against my defined configuration
16:13 is there a difference no there's not
16:16 awesome and hopefully the answer is no
16:18 there is not and so you can go to the
16:20 auditor and say Here's the you know the
16:23 run that I did against all my network
16:25 devices it found no differences and
16:27 here's the configuration that I've
16:28 defined in code that clearly has the
16:31 authentication policy enabled there you
16:33 go I'm done I don't have to tap every
16:35 single switch myself and pull the config
16:38 and dump it out into this giant you know
16:40 document that I deliver to them it's
16:43 here's the runner that I went through
16:45 that tested it against all the switches
16:47 and then here's the actual configuration
16:49 it was testing against you're good to go
16:53 so the smart auditor will come and say
16:56 well
16:58 there's a whole nother control plane
17:00 less authenticate that the folks making
17:05 the changes because we're no longer
17:07 making switch level changes we're not
17:09 going into the switch to configure
17:11 changes
17:12 this whole other team is doing this
17:14 platform team how do we ensure who has
17:17 rights to make changes if this
17:19 quote-unquote system is making changes
17:22 right I mean
17:24 because it depends on how you've secured
17:27 the workflow
17:28 so a fairly typical process is
17:31 everything goes through code you're
17:34 following sort of a git Ops process so
17:36 the way that I make changes in a system
17:38 is that I submit my changes via code to
17:42 the repository and that kicks off via a
17:45 web hook some CI CD Pipeline and in that
17:48 pipeline will be an approvals process
17:50 for the changes and so someone whether
17:54 it's an automated process or a manual
17:55 process needs to vet those changes
17:57 determine whether or not the changes
17:59 should be allowed and then approve those
18:01 changes and what you have in the
18:04 repository is a record of exactly who
18:06 committed the code and when they
18:07 committed it and in your pipeline you
18:09 have a record of exactly who approved
18:11 that code and when they approved it and
18:13 so you can trace the full change of your
18:16 environment through that entire process
18:18 now that doesn't mean that sometimes you
18:21 don't need to break glass in the case of
18:23 a a hard down situation where you need
18:26 to make immediate changes but that is is
18:29 hopefully an infrequent event and that
18:32 you have a well-defined process for
18:34 getting approval to break glass and make
18:36 changes
18:37 so this all starts with you can't
18:39 automate you can't code processes that
18:43 don't exist
18:45 well the the at the end of the day the
18:48 system is there all these systems that
18:51 we've talked about in this series those
18:55 systems are there to automate or codify
19:00 the things we've already written down on
19:03 paper the processes that we've already
19:06 talked about the operational
19:10 issues we've controlled for
19:12 I've saw I've seen cic CI CD processes
19:18 break entire systems because people
19:21 didn't sit down and write down their
19:25 existing processes and then build a CI
19:28 CD pipeline that supported their
19:31 existing pipeline uh pipelines they
19:34 tried to recreate their will and break
19:38 literally 30 years of integration test
19:42 processes Etc without really thinking
19:45 through and I think that that summarizes
19:48 the whole series what we're trying to do
19:51 is scale our operations in a way that uh
19:56 meets the bell of today the CTO
19:58 advisor's premise is that hybrid
20:01 infrastructure is here to stay we cannot
20:05 afford to have a bespoke approach to any
20:09 infrastructure whether that's network
20:11 storage compute or public Cloud we have
20:14 to have processes that scale take humans
20:18 out so we can put our people on smarter
20:20 and harder problems such as Network to
20:25 Cloud networking Cloud to Cloud
20:29 networking Cloud to Cloud security these
20:32 are problems we need to rededicate our
20:34 staffs to solving net any last comments
20:38 for our audience I think you hit the
20:41 nail on the head there it really is a
20:43 matter of automating existing processes
20:46 but most importantly you don't have to
20:50 twist the tool you don't have to twist
20:52 yourself out of shape to fit the tool
20:54 all these different tools that exist are
20:57 extensible and customizable and so you
21:00 should customize the workflow and select
21:02 the tool that meets the existing shape
21:04 and workflow of your organization
21:08 all right with that said you want to
21:10 find out more about the CTO advisor you
21:12 can follow us on the web the
21:13 ctoadvisor.com visit our friends Juniper
21:16 Network can folks find you if you're
21:19 looking for me the easiest way is to go
21:21 to my website Ned in the cloud.com all
21:24 of my links and other content are all
21:26 hosted there all right until then we'll
21:29 talk to you next video series