Network automation is a very hard problem. It is a distributed, parallel, real-time, highly available, performance-sensitive, security-sensitive control problem at the heart of society. This requires a system architecture.
Design:
requirements:
- Service assurance:When a service is degraded, an event automatically triggers the network to reconfigure itself.
- Reduced Deployment Time
- commodity hardware and software(NOS):
- use of the white box with open source Network operating system, purchase hardware and ASIC
- advantages:
- Cheap
- Easy to automate
- Decoupling the software and hardware implies the advantage of having two independent lifecycle managements
- Disadvantage:
- If there is bug, it will impact all the services
- advantages:
- use of the white box with open source Network operating system, purchase hardware and ASIC
- Data Model–Driven Management:
- Good scripts are based on good APIs, which fundamentally should provide the following benefits:
- Abstraction: A programmable API should abstract away the complexities of the underlying implementation. The DevOps engineers should not need to know unnecessary details, such as a specific order of configurations in a network element or specific steps to take if something fails. If this isn’t intuitive for humans, the sequencing of commands becomes even more complex for configuration engines. Configurations should function more like filling in a high-level checklist (these are the settings you need; now the system can go figure out how to properly group and order them).
- Data specification: The key thing an API does—whether it is a software or network API—is provide a specification for the data. First, it answers the question of what the data is—an integer, string, or other type of value? Next, it specifies how that data is organized. In traditional programing, this is called the data structure, though in the world of network programmability and databases, the more common term is schema, also known as data models. Since the network is basically being treated as a (distributed) database, the term (database) schema is sometimes used.
- Means of accessing the data: Finally, the API provides a standardized framework for how to read and manipulate the device’s data.
- Data model–driven management was initially built on NETCONF and XML, but other protocols/encodings have since seen the light: RESTCONF with JavaScript Object Notation (JSON), gRPC Network Management Interface (gNMI) with protobuf
- When applying an API to a complex environment, the key is that vendors implement it in a standards-based way. There should be a common way to define and access data across different devices and vendors—not separate, proprietary interfaces that operators must learn for every different device and function in their network.
- Good scripts are based on good APIs, which fundamentally should provide the following benefits:
- Telemetry:
- Telemetry uses a subscription model to identify information sources and destinations. Model-driven telemetry replaces the need for the periodic polling of network elements; instead, a continuous request for information to be delivered to a subscriber is established upon the network element. Then, either periodically, or as objects change, a subscribed set of YANG objects are streamed to that subscriber.
- Software defined networking:
- The first SDN discussions introduced the concept of the separation of the control and data plane entities,
- an OpenFlow controller configures the data plane, via the API, the CLI, or a GUI, in an OpenFlow switch
- the Open vSwitch Database Management Protocol (OVSDB, RFC 7047) is an open source effort to configure virtual switches in virtualized server environments. The OpenDaylight16 project, an open source controller project, did a great job of adding multiple configuration protocols
the term SDN evolved to mean a variety of things: network virtualization in the cloud, dynamic service chains for service provider subscribers, dynamic traffic engineering, dynamic network configuration, network function virtualization, open and programmable interfaces, and so on. What is for sure is that SDN is much more than OpenFlow and simply splitting the control and data planes.
SDN as a control plane separation paradigm (configuring the Routing Information Base [RIB] or the FIB) and DevOps (which some call SDN) are complementary: They use the same concepts and the same tools. As a practical example, a network operator might be configuring an IGP for distributed forwarding with tools based on NETCONF and YANG and inject specific policies on top of the IGP, directly in the RIB.
- Intent Based Networking:
- Data model–driven management simplifies the automation and specifies that the telemetry must be data model driven.
- Now, does the network behave as expected? Are the new services operational? Are the SLAs respected? You could check that the network device, the virtual machine, or the container is reachable and check that the services or the VNFs are correctly configured. However, validating the configuration and reachability of individual components does not imply that the services are running optimally or meet the SLAs.
- intent-based approach focuses on the higher-level business policies and on what is expected from the network. In other words, the prescriptive approach focuses on how, while the intent-based approach focuses on what. For example, the prescriptive way of configuring an L3VPN service involves following a series of tasks, expressing the how. For example, you must configure a VRF called “customer1” on provider edge router1 under the interface “eth0,” a default gateway pointing to router1 on the customer edge router, the MPLS-VPN connectivity between the provider edge router1 and router2, and so on.
- Conversely, the intent-based way focuses on what is required from the network (for example, a VPN service between the London and Paris sites for your customer).
Intent-based networking creates the most value is with constant learning, adapting, and optimizing, based on the feedback loop mechanism, as shown in the following steps:
STEP 1. Decomposition of the business intent (the what) to network configuration (the how). This is where the magic happens. For a single task such as “a VPN service between the London and Paris sites for customer C,” you need to understand the corresponding devices in Paris and London, the mapping of the operator topology, the current configuration of the customer devices, the operator core network configuration, the type of topology (such as hub and spoke or fully meshed), the required Quality of Service (QoS), the type of IP traffic (IPv4 and/or IPv6), the IGP configuration between the customer and the operator, and so on. Examine all the possible parameters for an L3VPN service in the specifications of the YANG data model for L3VPN service delivery (RFC 8299).
STEP 2. The automation. This is the easy part, once the what is identified. Based on the data model–driven management and a good set of YANG models, a controller or orchestrator translates the YANG service model (RFC 8299) into a series of network device configurations. Thanks to NETCONF and two-phase commit (more on this later), you are now sure that all devices are correctly configured.
STEP 3. The monitoring with data model–driven telemetry provides a real-time view of the network state. Any fault, configuration change, or even behavior change is directly reported to the controller and orchestrator (refer to the previous section).
STEP 4. Data analytics correlate and analyze the impact of the new network state for service assurance purposes, isolating the root cause issue—sometimes, even before the degradation happens. From there, the next network optimization is deduced, almost in real time, before going back to step 1 to apply the new optimizations.
YANG
YANG is an API contract language. This means that you can use YANG to write a specification for what the interface between a client and server should be on a particular topic.
A YANG-based server publishes a set of YANG modules, which taken together form the system’s YANG model. These YANG modules declare what a client can do.
- Configure: For example, decide where the log files are stored, state which speed a network interface uses, and declare whether a particular routing protocol is disabled or enabled, and if so, which peers it will have.
- Monitor status: For example, read how many lost packets there are on each network interface, check what the fan speeds are, and list which peers are actually alive in the network.
- Receive notifications: For example, hear that a virtual machine is now ready for work, be warned of the temperature crossing a configured threshold, or be alerted of repeated login failures.
- Invoke actions: For example, reset the lost packet counters, run a traceroute from the system to some address, or execute a system reboot.
The YANG model of a device is often called its “schema,” as in database schema or blueprint. A schema is basically the structure and content of messages exchanged between the application and the device.
The industry has started to centralize all important YANG modules in GitHub [https://github.com/YangModels/yang], with the YANG Catalog [https://yangcatalog.org/]19 as the graphical interface. Those are two excellent starting points if you are not sure where to begin.
The Management Architecture
Data Model–Driven Management Components
once the YANG models are specified and implemented, a network management system (NMS) can select a particular encoding (XML, JSON, protobuf, thrift, you name it) and a particular protocol (NETCONF, RESTCONF, or gNMI/gRPC) for transport.
The Server Architecture: Datastore