Ansible Network Automation, In Depth: Cisco IOS/NX-OS, Juniper Junos & Arista EOS with ansible.netcommon

A switch is not a server. It does not run Python, it does not run an SSH agent in the kernel sense, it does not let you write to /etc/, and most of them will hang up on you if you try to scp a file. And yet network teams have the same demand server teams do — declarative, idempotent, auditable change. Ansible network automation is the answer because Ansible’s agentless push model already worked over SSH, and because the heavy lifting can move off the device: the connection plugin runs the CLI/API session on the control node, structured modules render the right config and push exactly what is needed. The result is that a Cisco Catalyst, a Nexus 9000, an MX204, a Juniper SRX, and an Arista 7050 all become first-class targets for the same playbook → role → collection machinery you already know.

This lesson is the exhaustive tour. We start with why network is different: no Python on the box, no facts gathering by setup, no become, and connection plugins that proxy a CLI/NETCONF/REST session instead of executing modules locally. We then walk the four mainstream vendor collections — cisco.ios, cisco.nxos, junipernetworks.junos, arista.eos — and their resource module pattern, which is the modern, declarative way to model interfaces, VLANs, L3, OSPF, BGP and ACLs. We cover the seven canonical states (merged, replaced, overridden, deleted, parsed, gathered, rendered) so you understand exactly what a single state: keyword does to the running config, the difference between network_cli (screen-scraping CLI), netconf (XML/YANG over SSH) and httpapi (REST/eAPI), ansible.netcommon.cli_command and ansible.netcommon.cli_config for the dirty cases that don’t have a resource module yet, intent-based config drift detection with --diff and gathered+rendered, configuration backup with the network resource modules plus ansible.netcommon.cli_backup, and the production patterns that make multi-vendor automation tractable: a single inventory: per vendor, role-per-platform, group-vars-per-vendor for ansible_network_os, and Execution Environments built with the right collections so AAP can run network jobs at scale. Everything targets current Ansible (ansible-core 2.17+, the ansible.netcommon 6+, cisco.ios 6+, cisco.nxos 7+, junipernetworks.junos 8+, arista.eos 7+ collections, 2026), uses FQCN throughout, and ends with a free hands-on lab that uses the Containerlab project to stand up an Arista cEOS topology you can drive without owning hardware.

Learning objectives

After this lesson you can:

Explain why network targets are different (no Python on box, no setup, no become) and what that means for connection plugins and module dispatch.
Pick the right connection plugin — ansible.netcommon.network_cli, ansible.netcommon.netconf, ansible.netcommon.httpapi — for a given platform.
Set the per-host ansible_network_os (ios, nxos, junos, eos, iosxr, vyos) and the matching ansible_connection so the right module dispatch happens.
Use vendor resource modules (cisco.ios.ios_interfaces, cisco.ios.ios_l3_interfaces, cisco.ios.ios_vlans, cisco.nxos.nxos_bgp_global, arista.eos.eos_interfaces, junipernetworks.junos.junos_interfaces) and the seven canonical states.
Drive non-resource-modelled features with ansible.netcommon.cli_command and ansible.netcommon.cli_config safely.
Implement config backup, drift detection, and rollback patterns end to end.
Structure a multi-vendor inventory with group_vars/<vendor>/all.yml and run a single playbook against a fleet of mixed hardware.

Prerequisites & where this fits

You should already be comfortable with playbooks, plays and tasks, variables and the precedence rules, Jinja templating, and roles and collections. The companion advanced lessons that compound here are delegation, run_once and serial — because most network change is “drain leaf-1, change it, validate, move on” — performance tuning (network connections are slow per device, so forks and gather_facts: false matter more), and Execution Environments (the supported way to ship a network EE with all four vendor collections plus the right Python deps). In the Ansible Zero-to-Hero programme this is the Network expert lesson and a textbook EX374 / RHCEoSA-grade topic. Real teams running this in production almost always do so from AAP Controller plus a network EE, with mesh execution nodes pinned per network segment.

Core concepts

Five mental models carry the whole lesson.

1. Modules don’t run on the device — they run for the device. A normal Ansible module is shipped to the target, executed under Python, and prints JSON home. A network module runs on the control node; the connection plugin proxies a CLI/NETCONF/HTTP session to the box. That is why network plays don’t need Python on the device, don’t use become, and don’t gather facts via setup (they use a vendor *_facts module instead). Internalise this and the rest of the rules stop feeling arbitrary.

2. Three connection plugins, three universes. network_cli is screen-scraping over SSH — it enables, sends commands, parses output. It works everywhere because every box has a CLI, but it is the slowest and most brittle. netconf speaks XML/YANG over SSH — Juniper’s native protocol, and the way to get structured data without parsing CLI text. httpapi speaks REST/JSON over HTTPS — Arista eAPI, Cisco IOS-XE REST, F5 iControl, Meraki Dashboard. Pick the one the box does best.

3. Resource modules are declarative; raw cli_config is imperative. A resource module (ios_interfaces, eos_l3_interfaces, junos_vlans, nxos_bgp_global) takes a desired state in YAML and produces exactly the diff needed. A raw CLI module (cli_command, cli_config) lets you push lines directly. Resource modules win every time you can use them — they are idempotent, support state: replaced/overridden/deleted, and produce structured gathered data. Use raw CLI only when the resource module doesn’t cover the feature yet.

4. The seven states are the API. Every modern resource module accepts the same seven state: values: merged (add/update — default), replaced (replace just this resource), overridden (replace all resources of this type — destructive!), deleted (remove just this resource), parsed (read a CLI text file and return structured data — offline, no device), gathered (query the device and return structured data — read-only), rendered (turn structured data into the CLI lines it would push, without touching the device). Once you know these seven you know every resource module.

5. Multi-vendor is a grouping problem, not a coding problem. Network fleets are mixed by definition. The shape that scales is: one inventory group per vendor (cisco_ios, cisco_nxos, juniper_junos, arista_eos), group_vars/<group>/all.yml setting ansible_network_os and ansible_connection, and a single playbook that includes a vendor-specific role per group via import_playbook or include_role. Trying to write one role that detects vendor at runtime is a trap — the resource modules are vendor-namespaced by design.

Keep these terms straight: connection plugin (the SSH/NETCONF/HTTP transport), ansible_network_os (the platform identifier the connection plugin uses to dispatch the right module pack), resource module (declarative, structured, per-feature), cli_command / cli_config (raw CLI escape hatch), state machine (the seven canonical state: values), fact gathering (vendor *_facts modules, NOT setup), become semantics on network (means enable mode, not sudo).

Why network is different

Internalise this table; everything else falls out of it.

Server target	Network target
Has Python; modules ship and execute on the box.	Almost never has Python; modules execute on the control node.
`become: true` runs `sudo`.	`become: true` enters `enable` mode (and `become_method: enable`).
Facts are gathered by `ansible.builtin.setup`.	Facts are gathered by `cisco.ios.ios_facts`, `arista.eos.eos_facts`, etc.
`gather_facts: true` is fine.	Always set `gather_facts: false` at play level — `setup` will fail.
`ansible_connection: ssh` (default).	`ansible_connection: ansible.netcommon.network_cli` / `netconf` / `httpapi`.
One module = one `apt`/`yum`/`copy`.	One module = one feature (interfaces, VLANs, BGP), with seven states.
Idempotency from module logic on the box.	Idempotency from `gathered → diff → render` on the control node.
Rolling updates use `serial` over hosts.	Rolling updates use `serial` over devices, often `run_once` for spine-leaf coordination.

The two facts to never forget: gather_facts: false at the play level (otherwise setup runs against a switch and the play fails before your first task), and ansible_become_method: enable when you set become: true (because privilege escalation on a switch is enable, not sudo).

Connection plugins, in full

ansible.netcommon ships the three connection plugins network plays depend on. Each takes a slightly different set of vars.

`ansible.netcommon.network_cli`

Screen-scraping CLI over SSH. The lowest common denominator. Every IOS, NX-OS, Junos, EOS, IOS-XR and VyOS box supports it.

# group_vars/cisco_ios/all.yml
ansible_connection: ansible.netcommon.network_cli
ansible_network_os: cisco.ios.ios
ansible_user: netadmin
ansible_password: "{{ vault_cisco_password }}"
ansible_become: true
ansible_become_method: enable
ansible_become_password: "{{ vault_cisco_enable }}"
ansible_command_timeout: 60
ansible_persistent_command_timeout: 60
ansible_persistent_connect_timeout: 30

The persistent timeouts matter: network_cli keeps the SSH session alive across tasks (that is the whole point — re-handshaking SSH per task on a TCAM-heavy switch is brutal). Tune ansible_persistent_connect_timeout up if your devices are slow to negotiate; the ansible_command_timeout covers the time a single CLI command may take to return.

`ansible.netcommon.netconf`

XML/YANG over SSH on port 830. The right answer for Juniper (NETCONF is native) and increasingly for IOS-XR. Returns structured data and supports candidate config + commit.

# group_vars/juniper_junos/all.yml
ansible_connection: ansible.netcommon.netconf
ansible_network_os: junipernetworks.junos.junos
ansible_user: netadmin
ansible_ssh_private_key_file: ~/.ssh/junos
ansible_port: 830

NETCONF gives you commit/rollback semantics for free — Junos always commits to candidate first.

`ansible.netcommon.httpapi`

HTTPS + JSON. The right answer for Arista eAPI, Cisco IOS-XE RESTCONF, Cisco NX-API, F5 BIG-IP iControl, Meraki Dashboard, Cisco Catalyst Center, and a long tail of REST-only platforms.

# group_vars/arista_eos/all.yml
ansible_connection: ansible.netcommon.httpapi
ansible_network_os: arista.eos.eos
ansible_httpapi_use_ssl: true
ansible_httpapi_validate_certs: false
ansible_httpapi_port: 443
ansible_user: admin
ansible_password: "{{ vault_eos_password }}"
ansible_become: true
ansible_become_method: enable

httpapi is by far the fastest of the three because every “task” is a single REST roundtrip — no PTY allocation, no enable prompt, no banner scraping.

Plugin	Transport	Best for	Speed	Structured data
`network_cli`	SSH + PTY	Universal fallback	Slowest	No (CLI text → parsers)
`netconf`	SSH + NETCONF (830)	Junos, IOS-XR	Medium	Yes (XML/YANG)
`httpapi`	HTTPS + JSON	Arista, IOS-XE REST, NX-API	Fastest	Yes (JSON)

The vendor collections

Four collections cover ~95% of enterprise network fleets:

Collection	Platform	Lead module families
`cisco.ios`	Catalyst / IOS, IOS-XE	`ios_interfaces`, `ios_l2_interfaces`, `ios_l3_interfaces`, `ios_vlans`, `ios_ospfv2`, `ios_bgp_global`, `ios_acls`, `ios_facts`, `ios_command`, `ios_config`
`cisco.nxos`	Nexus / NX-OS	`nxos_interfaces`, `nxos_vlans`, `nxos_bgp_global`, `nxos_bgp_neighbors`, `nxos_ospfv2`, `nxos_facts`, `nxos_command`
`junipernetworks.junos`	Junos (MX, SRX, EX, QFX)	`junos_interfaces`, `junos_l2_interfaces`, `junos_l3_interfaces`, `junos_vlans`, `junos_ospfv2`, `junos_bgp_global`, `junos_facts`, `junos_command`, `junos_config`
`arista.eos`	EOS (7000-series)	`eos_interfaces`, `eos_l2_interfaces`, `eos_l3_interfaces`, `eos_vlans`, `eos_bgp_global`, `eos_ospfv2`, `eos_facts`, `eos_command`, `eos_config`

Plus the cross-vendor toolkit:

Module	Purpose
`ansible.netcommon.cli_command`	Run a raw CLI command on any platform that has `network_cli`.
`ansible.netcommon.cli_config`	Push raw config lines safely (with `replace`, `match`, `before`, `after`).
`ansible.netcommon.cli_backup`	Save the running config to a local file.
`ansible.netcommon.cli_parse`	Parse CLI text into structured data via TextFSM/NTC-Templates/PyATS/native.
`ansible.netcommon.netconf_get` / `netconf_config`	Direct NETCONF read/write for non-Junos NETCONF boxes.

Install everything you need at once:

ansible-galaxy collection install -r requirements.yml

# requirements.yml
collections:
  - name: ansible.netcommon
  - name: cisco.ios
  - name: cisco.nxos
  - name: junipernetworks.junos
  - name: arista.eos

The resource module pattern

Resource modules are the headline feature. They work the same way across all four vendors. Here is the full mental model on cisco.ios.ios_interfaces:

- name: Declare interface intent on Cat9300
  cisco.ios.ios_interfaces:
    config:
      - name: GigabitEthernet1/0/1
        description: "uplink-to-spine-1"
        enabled: true
        mtu: 9216
      - name: GigabitEthernet1/0/2
        description: "server rack A"
        enabled: true
        mtu: 1500
    state: merged

What happens depends entirely on state:.

The seven states, in detail

merged (default) — combine the desired config with whatever is already there. Add interfaces not listed; update the ones that are; leave existing-but-not-listed interfaces alone. The least destructive state. Use for adding configuration.

replaced — for each listed interface, replace its config with what you wrote. Interfaces not listed are untouched. Use for “I want THIS interface to look exactly like this, and I don’t care what was there.”

overridden — replace all interfaces of this type with the listed set. Anything not listed is removed. Destructive — use with eyes open. Always preview with --check --diff first.

deleted — remove the configuration for the listed interfaces. Use for cleanup.

parsed — running_config: takes a string of CLI text; the module returns the structured equivalent. Runs offline. Use for migrating from a Tcl/Expect script: parse the old box’s CLI dump and feed it forward.

gathered — query the device and return structured data. Read-only. Use for backups, drift, and reports.

rendered — config: is given; the module returns the CLI lines it would push. Does not touch the device. Use to preview, to generate config for golden images, or to feed a build-then-push pipeline.

State	Touches device?	Touches config?	Destructive?	Typical use
`merged`	yes	yes	no	day-2 add
`replaced`	yes	yes	per-resource	day-2 update
`overridden`	yes	yes	category-wide	rebuild from intent
`deleted`	yes	yes	yes	decommission
`parsed`	no	no	no	migrate from text
`gathered`	yes (read)	no	no	drift / backup
`rendered`	no	no	no	preview / golden

The seven states are the same across cisco.ios.*, cisco.nxos.*, junipernetworks.junos.*, and arista.eos.*. Once you have learned them on ios_interfaces, you have learned them on eos_bgp_global and junos_l3_interfaces.

Drift detection in practice

The standard drift loop is gathered → diff → alert:

- name: Capture current interface state
  cisco.ios.ios_interfaces:
    state: gathered
  register: live

- name: Load intended state from inventory
  ansible.builtin.set_fact:
    intent: "{{ interfaces_intent }}"

- name: Render what we WOULD push
  cisco.ios.ios_interfaces:
    config: "{{ intent }}"
    state: rendered
  register: rendered

- name: Show the diff
  ansible.builtin.debug:
    msg: |
      LIVE  : {{ live.gathered }}
      INTENT: {{ intent }}
      WOULD PUSH: {{ rendered.rendered_config }}

Or use --check --diff against state: replaced to get a structured diff in --diff output without touching the device.

Multi-vendor inventory shape

The shape that scales is one group per vendor with vendor-pinned vars:

# inventory/hosts.ini
[cisco_ios]
edge-rtr-1.lab
edge-rtr-2.lab

[cisco_nxos]
spine-1.lab
spine-2.lab
leaf-1.lab
leaf-2.lab

[arista_eos]
border-leaf-1.lab
border-leaf-2.lab

[juniper_junos]
mx-edge-1.lab
mx-edge-2.lab

[network:children]
cisco_ios
cisco_nxos
arista_eos
juniper_junos

# group_vars/cisco_ios/all.yml
ansible_connection: ansible.netcommon.network_cli
ansible_network_os: cisco.ios.ios
ansible_become: true
ansible_become_method: enable

# group_vars/cisco_nxos/all.yml
ansible_connection: ansible.netcommon.httpapi
ansible_network_os: cisco.nxos.nxos
ansible_httpapi_use_ssl: true
ansible_httpapi_validate_certs: false

# group_vars/arista_eos/all.yml
ansible_connection: ansible.netcommon.httpapi
ansible_network_os: arista.eos.eos
ansible_httpapi_use_ssl: true

# group_vars/juniper_junos/all.yml
ansible_connection: ansible.netcommon.netconf
ansible_network_os: junipernetworks.junos.junos
ansible_port: 830

# group_vars/network/all.yml — applies to ALL vendors
ansible_user: netadmin
ansible_password: "{{ vault_net_password }}"
gather_facts: false

Then a top-level playbook composes per-vendor plays:

# site.yml
- import_playbook: plays/cisco_ios.yml
- import_playbook: plays/cisco_nxos.yml
- import_playbook: plays/arista_eos.yml
- import_playbook: plays/juniper_junos.yml

Configuration backup

Two patterns. The simple one uses cli_backup:

- name: Back up running config to backups/
  ansible.netcommon.cli_backup:
    filename: "{{ inventory_hostname }}-{{ ansible_date_time.iso8601_basic_short }}.cfg"
    dir_path: "./backups/"

The structured one uses state: gathered per resource and dumps YAML — useful for git-tracked declarative backups:

- name: Capture interfaces intent
  cisco.ios.ios_interfaces:
    state: gathered
  register: ifs

- name: Save as YAML backup
  ansible.builtin.copy:
    content: "{{ ifs.gathered | to_nice_yaml }}"
    dest: "./intent/{{ inventory_hostname }}/interfaces.yml"
  delegate_to: localhost

Commit intent/ to git per nightly run, and you have a per-feature, structured, diff-friendly backup of every device.

The escape hatch: `cli_command` and `cli_config`

When the resource module doesn’t cover the feature (yet), drop to raw CLI. Use cli_command for show and exec commands:

- name: Show CDP neighbors and capture
  ansible.netcommon.cli_command:
    command: "show cdp neighbors detail"
  register: cdp

Use cli_config for configuration changes — it has the safety levers match, replace, before, after:

- name: Push raw config lines (idempotent via `match`)
  ansible.netcommon.cli_config:
    config: |
      ip access-list standard MGMT-IN
       permit 10.0.0.0 0.0.0.255
       deny any log
    replace: block
    match: line
    before:
      - "no ip access-list standard MGMT-IN"

Avoid cli_config when a resource module exists — you lose the seven-state semantics, the --diff is blind to it (it shows the lines, not the intent), and idempotence is your problem.

Hands-on free lab — Containerlab + cEOS

You will not buy a switch. The lab uses Containerlab, an open-source CLI that runs vendor virtual images (cEOS for Arista, vMX for Juniper, etc.) in Docker. Arista cEOS is free for personal use after registration. The whole exercise is done from your laptop.

# 1. install containerlab + docker (mac/linux)
brew install containerlab          # or curl -sL https://get.containerlab.dev | bash

# 2. pull arista cEOS (registration is free at arista.com/support)
docker import cEOS-lab.tar.xz ceos:4.32.0F

# 3. topology
mkdir -p netlab && cd netlab
cat > topo.clab.yml <<'EOF'
name: ansible-network
topology:
  nodes:
    leaf-1:
      kind: ceos
      image: ceos:4.32.0F
      mgmt-ipv4: 172.20.20.11
    leaf-2:
      kind: ceos
      image: ceos:4.32.0F
      mgmt-ipv4: 172.20.20.12
  links:
    - endpoints: ["leaf-1:eth1", "leaf-2:eth1"]
EOF

sudo containerlab deploy -t topo.clab.yml

You now have two real Arista EOS devices reachable on 172.20.20.11/12.

Set up an Ansible workspace:

mkdir -p ansible-net && cd ansible-net
ansible-galaxy collection install ansible.netcommon arista.eos

cat > inventory.ini <<'EOF'
[arista_eos]
leaf-1 ansible_host=172.20.20.11
leaf-2 ansible_host=172.20.20.12
EOF

mkdir -p group_vars/arista_eos
cat > group_vars/arista_eos/all.yml <<'EOF'
ansible_connection: ansible.netcommon.httpapi
ansible_network_os: arista.eos.eos
ansible_user: admin
ansible_password: admin
ansible_httpapi_use_ssl: true
ansible_httpapi_validate_certs: false
ansible_become: true
ansible_become_method: enable
EOF

Now the playbook — declare interface intent and assert it:

# site.yml
- name: Configure leaf interfaces
  hosts: arista_eos
  gather_facts: false
  tasks:
    - name: Interfaces intent
      arista.eos.eos_interfaces:
        config:
          - name: Ethernet1
            description: "to-leaf-peer"
            enabled: true
            mtu: 9214
        state: replaced
      register: change

    - name: Verify with gathered
      arista.eos.eos_interfaces:
        state: gathered
      register: live

    - name: Assert MTU landed
      ansible.builtin.assert:
        that:
          - "live.gathered | selectattr('name','equalto','Ethernet1') | map(attribute='mtu') | first | int == 9214"

Run it:

ansible-playbook -i inventory.ini site.yml --diff

You’ll see the --diff show exactly the two CLI lines pushed (description "to-leaf-peer" and mtu 9214), then changed=2. Run it again — changed=0. That is idempotent network change.

Tear down:

sudo containerlab destroy -t topo.clab.yml

Common mistakes & troubleshooting

gather_facts: true on a network play. setup runs against a switch, fails, the play aborts before your first task. Always gather_facts: false at the play level, then call *_facts modules explicitly.

Forgot become_method: enable. Tasks fail with “permission denied” or commands silently run from user mode. Set ansible_become_method: enable in group_vars/<vendor>/all.yml.

Wrong ansible_network_os FQCN. Old playbooks use ios / nxos; modern plays must use cisco.ios.ios / cisco.nxos.nxos. The short form still works but emits deprecation warnings.

Connection plugin mismatch. Setting ansible_connection: ssh on a network host gets you setup-style failures. The connection plugin must be ansible.netcommon.network_cli / netconf / httpapi.

Slow per-task latency. You forgot ansible_persistent_connect_timeout and the connection is being torn down between tasks. With network_cli, the persistent socket is what makes a 50-task play finish in seconds, not minutes.

overridden accidentally wipes interfaces you didn’t list. Read the state table again. Use replaced for per-resource updates; overridden is only for “rebuild the entire feature from intent.”

No --check --diff before destructive change. Always preview. The --diff output of a network resource module shows the exact CLI lines that would be pushed.

Trying to delegate_to: localhost for everything. Some cloud-managed APIs (Meraki, Cisco DNA Center, F5) really do want delegate_to: localhost — but most network plays should not delegate; the connection plugin already runs on the control node.

Mixing cli_config and resource modules for the same feature. You’ll fight idempotence forever. Pick one. Resource modules win.

Best practices

One vendor group per inventory group, with group_vars/<vendor>/all.yml setting ansible_network_os and ansible_connection. Never mix vendors in the same group.
Always gather_facts: false at play level on network plays.
Prefer httpapi > netconf > network_cli when the platform supports the higher option. eAPI is dramatically faster than CLI scraping.
Use resource modules; reach for cli_config only when the resource module doesn’t cover the feature.
Drive change with state: replaced per resource, not overridden. Reserve overridden for “rebuild from intent” days.
Run --check --diff first on any production change. Network blast radius is enormous.
Backup before change — a cli_backup task at the top of every change role is non-negotiable.
Pin collection versions in requirements.yml. Network collection schemas evolve; you do not want a Galaxy update changing your interface model overnight.
Build a network EE with all four collections + their Python deps (paramiko, ncclient, requests) baked in, push it to Private Automation Hub, and pin AAP job templates to it.
Run network jobs from a mesh execution node inside the management network — network_cli over a 200ms WAN link will time out.
Use serial: 1 and a delegated drain step for fabric upgrades — change a leaf only after you’ve drained traffic through it.

Security notes

Network credentials are crown-jewel secrets. Use Ansible Vault, or better, AAP credential plugins that pull from CyberArk/HashiCorp Vault at job time.
Set ansible_httpapi_validate_certs: true once your fabric has a trusted internal CA. The “false” you see in lab snippets is only for lab.
Every network change should produce a backup artefact. Persist backups in a separate, append-only object store (S3 with object lock) — not in the same git repo your playbooks live in.
Use TACACS+/RADIUS-issued, per-engineer logins on the device, then have AAP use a service account with the same TACACS path. The audit trail then lives in TACACS, not in the playbook log.
Treat cli_config with the same care as shell — it bypasses the resource-module idempotence guarantees. Code-review every PR that introduces cli_config.
Never check in running-config dumps that contain SNMP communities, RADIUS keys, or BGP MD5 passwords. Sanitise on backup.
For air-gapped sites, build the network EE locally and pull Galaxy collections from your Private Automation Hub mirror — not from galaxy.ansible.com.

Interview & exam Q&A

Q1. Why do we set gather_facts: false on network plays? Because gather_facts calls ansible.builtin.setup, which runs Python on the target. Switches don’t have Python, so setup fails and the play aborts. Use cisco.ios.ios_facts, arista.eos.eos_facts, etc., explicitly when you need facts.

Q2. Difference between network_cli, netconf and httpapi? network_cli screen-scrapes CLI over SSH (universal but slowest). netconf is XML/YANG over SSH on port 830 (Juniper-native, structured, has commit/rollback). httpapi is HTTPS+JSON (fastest; Arista eAPI, IOS-XE REST, NX-API, F5 iControl). Pick the highest one the platform supports.

Q3. What does become: true mean on a switch? It enters enable mode. You must also set ansible_become_method: enable (not sudo), and ansible_become_password if your devices use an enable secret.

Q4. Walk me through the seven canonical resource-module states. merged adds/updates only what you list (default). replaced rewrites each listed resource. overridden rewrites every resource of that type (anything not listed is wiped). deleted removes listed resources. parsed reads CLI text offline and returns structured data. gathered queries the device and returns structured data (read-only). rendered turns intent into the CLI lines it would push, without touching the device.

Q5. When would you use cli_config instead of a resource module? Only when the feature you need isn’t yet covered by a resource module — for example a vendor-proprietary feature in a brand-new release. You give up idempotence guarantees and structured diffs in exchange for raw access. Code-review every such usage.

Q6. How do you detect drift on a fleet of 200 Cisco IOS routers? Run ios_interfaces, ios_l3_interfaces, ios_ospfv2, etc. with state: gathered against the fleet, compare against a per-host intent stored in YAML (group/host vars), and report diffs. Or run the same modules with state: replaced and --check --diff to see exactly which lines would change.

Q7. Why is httpapi so much faster than network_cli? network_cli allocates a PTY, deals with banners and enable prompts, and parses CLI text per command. httpapi is a stateless HTTPS request that returns JSON. No PTY, no parser, no terminal handshake.

Q8. What is the right way to store device credentials for a 500-device fleet? Don’t put them in group_vars plaintext. Either: (a) Vault-encrypt group_vars/<vendor>/vault.yml and decrypt with --vault-id; or (b) use AAP credential plugins that pull from CyberArk/Conjur/HashiCorp Vault at job runtime — credentials never sit on disk.

Q9. How does parsed state differ from gathered? parsed is offline — you give the module a CLI text dump (running_config: parameter) and it returns the structured equivalent. The device is not touched. gathered queries the live device and returns the same structured shape. Use parsed for migrations from a text dump; use gathered for live drift.

Q10. How do you do a rolling upgrade across a leaf-spine fabric? serial: 1 on the leaves, with a pre_tasks delegate that drains traffic (config nve withdraw, bgp graceful-shutdown, etc.) on the upstream spine, then the change runs on the leaf, then post_tasks re-advertises. Combine with any_errors_fatal: true so a failed leaf stops the rollout.

Q11. How do replaced and overridden differ in practice? On ios_interfaces with three interfaces in config:: replaced rewrites those three and leaves all other interfaces alone. overridden rewrites those three and removes the configuration of every other interface on the box. One is per-resource; the other is feature-wide.

Q12. What’s the production EE story for network automation? Build an Execution Environment containing ansible.netcommon, all the vendor collections you use, plus Python deps (paramiko, ncclient, requests, pyats if you parse CLI). Push to Private Automation Hub, sign it, and pin AAP job templates by digest. Run from a mesh execution node inside the management network so latency is sane.

Q13. Why must mesh execution nodes be inside the management network? A network_cli task is many round-trips per command. Across a 100-200ms WAN link, a 30-task play takes minutes per device. Inside the mgmt VLAN, the same play finishes in seconds.

Q14. Where do delegate_to: localhost patterns belong in network automation? For cloud-managed APIs that don’t accept incoming SSH/HTTPS — Meraki Dashboard, DNA Center, Cisco vManage, F5 BIG-IQ, Cloudflare. The “device” is a tenant on a cloud API, so the play runs locally and authenticates outbound.

Quick check

What FQCN do you set as ansible_network_os for an Arista EOS box driven by eAPI?
Which state: would you use to read interface config off a device into structured YAML, without changing anything?
Why does become_method: enable matter on a Cisco IOS play?
What is the difference between replaced and overridden on cisco.ios.ios_vlans?
Which connection plugin gives you commit/rollback semantics natively?

(Answers: arista.eos.eos; gathered; because enable is the privilege-escalation mode on IOS, sudo doesn’t exist; replaced rewrites only the listed VLANs, overridden rewrites all VLANs on the device; netconf — Junos commits to candidate config first.)

Exercise

Stand up the Containerlab cEOS topology from the lab. Then:

Write a site.yml that declares interface intent (Ethernet1 mtu=9214, Ethernet2 mtu=1500) on both leaves using arista.eos.eos_interfaces with state: replaced.
Add a state: gathered task that registers the live state and a debug that prints the diff.
Add a state: rendered task that prints the CLI lines that would be pushed if you applied the intent today (so you can paste them into a change ticket).
Add a backup step at the top using ansible.netcommon.cli_backup.
Wrap the change steps in a block with rescue that runs cli_config to roll back to the backup file if anything fails.
Run with --check --diff. Then run for real. Then run again — assert changed=0.
(Stretch) Add delegate_to: localhost plus serial: 1 so the leaves change one at a time.

Certification mapping

Cert	Coverage
EX374 — Red Hat Certified Specialist in Ansible Automation	Direct: network automation, resource modules, `state: replaced/gathered`, multi-vendor inventory, EE for network.
RHCE EX294	Indirect: connection plugins, FQCN, `become_method`, role layout.
Cisco DevNet Professional / Specialist (Network Automation)	Direct: `cisco.ios`/`cisco.nxos` collections, NX-API and IOS-XE REST.
JNCIA-DevOps	Direct: NETCONF, `junipernetworks.junos.*`, commit/rollback semantics.
Arista ACE-A / ACE-AS	Direct: eAPI, `arista.eos.*` resource modules.

Glossary

ansible_network_os — the platform identifier the connection plugin uses to dispatch the right module pack (cisco.ios.ios, arista.eos.eos, …).
Connection plugin — network_cli / netconf / httpapi from ansible.netcommon.
Resource module — a declarative, structured, per-feature module (e.g. ios_interfaces) supporting the seven state: values.
network_cli — screen-scraping CLI over SSH (slowest, universal).
netconf — XML/YANG over SSH on port 830 (Junos-native).
httpapi — REST/JSON over HTTPS (Arista eAPI, IOS-XE REST, NX-API).
Seven states — merged, replaced, overridden, deleted, parsed, gathered, rendered.
cli_config — raw CLI escape hatch with match/replace/before/after.
Network EE — an Execution Environment containing the vendor collections and their Python deps, pinned by AAP job templates.

Next steps

Now that you can drive switches and routers from a playbook, the cloud-target lessons follow. Move on to Ansible for AWS, Ansible for Azure, and Ansible for GCP for the cloud control-plane equivalents — same agentless model, very different module shapes. After that, Ansible for Windows covers the third major OS family, and Ansible for Kubernetes rounds out the platform tour. The Specialist-tier lessons (CI/CD, compliance, scale) all assume you can drive any of these targets.

Ansible Network Automation, In Depth: Cisco IOS/NX-OS, Juniper Junos & Arista EOS with ansible.netcommon

Learning objectives

Prerequisites & where this fits

Core concepts

Why network is different

Connection plugins, in full

`ansible.netcommon.network_cli`

`ansible.netcommon.netconf`

`ansible.netcommon.httpapi`

The vendor collections

The resource module pattern

The seven states, in detail

Drift detection in practice

Multi-vendor inventory shape

Configuration backup

The escape hatch: `cli_command` and `cli_config`

Hands-on free lab — Containerlab + cEOS

Common mistakes & troubleshooting

Best practices

Security notes

Interview & exam Q&A

Quick check

Exercise

Certification mapping

Glossary

Next steps

Written by Vinod

Comments

Ansible Network Automation, In Depth: Cisco IOS/NX-OS, Juniper Junos & Arista EOS with ansible.netcommon

Learning objectives

Prerequisites & where this fits

Core concepts

Why network is different

Connection plugins, in full

ansible.netcommon.network_cli

ansible.netcommon.netconf

ansible.netcommon.httpapi

The vendor collections

The resource module pattern

The seven states, in detail

Drift detection in practice

Multi-vendor inventory shape

Configuration backup

The escape hatch: cli_command and cli_config

Hands-on free lab — Containerlab + cEOS

Common mistakes & troubleshooting

Best practices

Security notes

Interview & exam Q&A

Quick check

Exercise

Certification mapping

Glossary

Next steps

Written by Vinod

Comments

`ansible.netcommon.network_cli`

`ansible.netcommon.netconf`

`ansible.netcommon.httpapi`

The escape hatch: `cli_command` and `cli_config`