News

Back
2020
July
23
2020

Network Automation with ONIE, ZTP and Ansible

Network engineering and system engineering often seem to be a long way apart, which is also emphasized by the completely different operating concepts of the respective devices. Our new switching infrastructure has shown us that this does not have to be the case. Thanks in no small part to the open source approach of Cumulus Linux, the two worlds are converging and creating synergies with existing tools and processes. In this article, we will take a look at selected aspects and show that the migration of our network has resulted in more than just faster switch ports.

Nothing new (at first sight)

Cumulus Linux, the network operating system based on Debian, makes it easy for experienced network engineers to get started. The important settings are accessible through a CLI, and this interface has been aligned with CLIs from major manufacturers, which enables network specialists to find their way around the new environment quickly and to build on knowledge they have acquired elsewhere. Even useful features that are only gradually being adopted in the industry are available by default on network devices running Cumulus Linux; it is, for example, possible to apply – and undo – a block of commands in one go.

However, the fact that Cumulus Linux is based on Debian opens up additional, powerful possibilities. Logging in does not take you into the familiar but limited CLI, but directly into a regular Linux shell. The CLI is just one command away, but above all, thanks to full root access, you can also use all the other tools that prove indispensable in a system engineer's everyday life: from utilities such as htop and watch to config management (e.g. Ansible) and monitoring via Zabbix agent.

Efficiency through config management

The ability to administer network devices using a config management system is a key feature that many Cumulus users will not want to do without. Having relied on Ansible for the management of the cloudscale.ch servers for quite some time, Cumulus Linux now allows us to manage network devices through Ansible as well. In its simplest form, Ansible acts as a client of the CLI named "NCLU" (Network Command Line Utility): using familiar commands, numerous switches and routers can be consistently configured without manual interaction.

Tapping the full potential, however, requires the use of Jinja templates. Instead of long sequences of individual commands, in which the critical variations are easily overlooked, templates are maintained in relatively short and clear files. Thanks to the use of loops and conditionals, extensive and complex configurations can be represented in a better structured way. This greatly reduces the risk of making careless mistakes such as inconsistent MTU or VLAN configurations.

The following excerpts illustrate how we populate /etc/network/interfaces on our switches using the Ansible template module.

Ansible Inventory Variables:

vrfs:
  mgmt:
    description: VRF Mgmt
    ipv4_address: 127.0.0.1/8
  quarantine:
    description: VRF Quarantine (for test purposes)
    ipv4_address: '{{ "10.0.0.0/24" | ipaddr(device_id) | ipaddr("address") }}/32'
  private:
    description: VRF Private (networks without a default gateway)
  public:
    description: VRF Public (networks with a default gateway)
    ipv4_address: '{{ "203.0.113.0/24" | ipaddr(device_id) | ipaddr("address") }}/32'
    ipv6_address: '{{ "2001:db8:bb::/64" | ipaddr(device_id) | ipaddr("address") }}/128'
  dci:
    description: VRF DCI (networks on data center interconnect)
    ipv4_address: '{{ "172.16.16.0/24" | ipaddr(device_id) | ipaddr("address") }}/32'

Jinja2 Template:

{% for name, vrf in vrfs.items() if name != "default" -%}
# {{ vrf.description }}
auto {{ name }}
iface {{ name }}
    {% if vrf.ipv6_address is defined -%}
    address {{ vrf.ipv6_address }}
    {% endif -%}
    {% if vrf.ipv4_address is defined -%}
    address {{ vrf.ipv4_address }}
    {% endif -%}
    vrf-table auto

{% endfor -%}

Fully automated provisioning

That said, ongoing maintenance of the configuration is only half the battle. We have decided to carry out all major upgrades of our switches in the form of a complete reinstall. This ensures a reproducible state, and we can test the upgrade process and other changes as often as we like in the lab beforehand. Cumulus Networks has developed "ONIE" (Open Network Install Environment) for this purpose. In a similar manner to the PXE environment known from servers, this open system allows booting and subsequent installation of the operating system via the network. Thanks to "ZTP" (Zero-Touch Provisioning), any desired settings can be defined in advance, so that the provisioning of the newly installed system can then be finalized by Ansible without the need for a manual intermediate step.

The following excerpt from our ZTP configuration automates the steps that are typically needed after reinstallation of a switch.

Excerpt from our ztp.sh:

[...]

# In order to start switchd, you need to install a valid license
echo 'user@example.com|3DSpMBACDihILepwdy4/5Ecd34jlAg4h+FiE/9zZawtujnk3Fw' > /home/cumulus/license.txt
/usr/cumulus/bin/cl-license -i /home/cumulus/license.txt
systemctl restart switchd.service

# Move the eth0 (management) interface to a separate management VRF
/usr/bin/net add vrf mgmt && /usr/bin/net commit

# Drop SSH keys in order to log in without using a password
{% for key in ssh_keys %}
echo "{{ key }}" >> /home/cumulus/.ssh/authorized_keys
{% endfor %}

# The following line is required somewhere in the script file for execution to occur
# CUMULUS-AUTOPROVISIONING

[...]

For us, reinstalling a switch with Cumulus Linux takes less than 10 minutes. This allows us to run through configuration changes, new versions, or upgrade paths as often as needed. Once we are ready to upgrade the production devices, we simply apply the same process that has been tested many times, virtually eliminating the risk of typos and inconsistencies. Incidentally, ONIE needs to be supported by the hardware in question, which is not produced by Cumulus Networks itself. The fact that virtually all major network manufacturers have integrated ONIE in a very short time demonstrates just how much this elegant solution was missing before.


Having its roots in Debian and open source, Cumulus Linux fits in well with our philosophy. At the same time, the "open source DNA" also applies in the opposite direction. Cumulus Networks has, for example, contributed the implementation of VRF (Virtual Routing and Forwarding) to the Linux kernel. This has moved the "server" and "networking" fields even closer together.

Open and efficient,
Your cloudscale.ch team

Back to overview