Ansible Lesson 1 of 42

Ansible Fundamentals: Architecture, the Agentless Push Model & Idempotency

If you have ever configured a fleet of servers by SSH-ing into each one and running the same commands by hand, you already know the three problems Ansible exists to solve. The first is drift: server number seven got a slightly different package version because you were interrupted, and now “all the web servers are identical” is a hopeful fiction. The second is scale: what works for three machines is unbearable for three hundred, and impossible for three thousand. The third is memory: six months later nobody can say why a setting is the way it is, because the configuration lives only in the muscle memory of whoever last touched the box. Ansible answers all three by letting you describe the desired state of your machines in plain-text YAML files, keep those files in Git like any other code, and push that state out to as many machines as you like, repeatably and safely. That is configuration management, and Ansible is the tool that made it approachable for everyone — not just specialists.

This lesson is the on-ramp for the whole Ansible track. By the end you will understand what Ansible is and why it specifically won the configuration-management category; you will know its architecture cold — the difference between the control node and the managed nodes, what “agentless” really means, and how the connection works over SSH for Linux and WinRM for Windows; you will understand the push model and how it differs from the pull model used by Puppet and Chef; you will internalise idempotency — the single most important property in the tool — and the changed versus ok result model that flows from it; you will meet the six building blocks (inventory, modules, plugins, playbooks, roles, collections) and the FQCN naming scheme that ties them together; and you will be able to place Ansible precisely against Terraform, Puppet, Chef and Salt so you know when to reach for which. We are working with current Ansible throughout — ansible-core 2.17+ (the 2026 line) and the broader Ansible 10+ package — using the real ansible and ansible-playbook commands and real YAML.

Learning objectives

After working through this lesson you will be able to:

Prerequisites

You need almost nothing to start. A working knowledge of the Linux command line, comfort with SSH (you should know what an SSH key pair is), and a text editor are enough. A basic familiarity with YAML helps but is not assumed — we keep the YAML in this lesson minimal and explain it as it appears. No prior Ansible, no programming background, and no configuration-management experience are required; every term is defined as it shows up. This is the first stop in the Ansible Zero-to-Hero ladder. If you have read the course’s Infrastructure as Code: Core Concepts lesson you will recognise idempotency and drift at a higher level; here we ground them in actual Ansible behaviour. Everything that follows in the track — installation and ansible.cfg, inventory, ad-hoc commands, playbooks, roles and Vault — builds directly on the mental models below. This lesson, and the whole track, maps to the Red Hat Certified Engineer (RHCE) EX294 exam.

What Ansible is, and why it won

Ansible is an open-source automation engine. Its most common job is configuration management — bringing servers to a known, declared state (packages installed, services running, files in place, users created) and keeping them there — but the same engine also does application deployment, orchestration (coordinating multi-tier rollouts and rolling restarts), provisioning (creating cloud resources via cloud modules), and ad-hoc administration (one-off commands across many hosts). You describe what you want in YAML, and Ansible makes the machines match.

Configuration management as a discipline is about eliminating snowflake servers — machines that have drifted into being subtly, undocumentedly unique. The cure is to treat the definition of a server the way you treat application source code: written in files, version-controlled, reviewed in pull requests, and applied by a tool rather than by hand. The payoff is reproducibility (stamp out identical machines on demand), auditability (every change is a reviewable diff with an author and timestamp), and recoverability (the files are the rebuild plan).

Ansible was created by Michael DeHaan in 2012, acquired by Red Hat in 2015, and has been the most widely adopted tool in its category for years. The reasons compounded:

Reason What it means in practice
Agentless Nothing to install, run, or patch on the machines you manage. If a host has SSH and Python, Ansible can manage it today — no daemon, no certificate dance, no bootstrap problem.
Low barrier to entry Playbooks are YAML, not a bespoke programming language. A new engineer can read a playbook on day one and roughly understand it; Puppet’s and Chef’s DSLs have a steeper climb.
Push model You run Ansible from one place and it reaches out to the targets. There is no central server the nodes must check in with, and no “is the agent healthy?” failure mode.
Huge module & collection ecosystem Thousands of modules for Linux, Windows, network gear, every major cloud, databases and SaaS — most distributed as versioned collections on Ansible Galaxy and Automation Hub.
Idempotent by design Modules converge to a desired state, so running a playbook repeatedly is safe — the core property that makes automation trustworthy.
Backed by Red Hat, with an enterprise tier A clear path from the free CLI to Ansible Automation Platform for teams that need RBAC, a web UI, scheduling and auditing. Strong jobs market and certification (RHCE).

A note on names and versions you must know in 2026: the engine itself is ansible-core (the binaries and the small set of built-in modules), currently on the 2.17+ line. The thing most people pip install as ansible is a larger “batteries-included” package — ansible-core plus a curated bundle of ~70 popular collections — at version 10+ (the version numbers of the two diverged years ago; Ansible 10 ships ansible-core 2.17). At the top sits Ansible Automation Platform (AAP), Red Hat’s commercial product. We will pin these down precisely later in the lesson.

The architecture: control node and managed nodes

Ansible’s architecture is refreshingly small. There are exactly two kinds of machine, and only one of them has Ansible installed.

The control node is the machine where Ansible itself lives and runs — your laptop, a jump host, a CI runner, or an AAP server. It holds the inventory, the playbooks, the roles and collections, and the ansible/ansible-playbook binaries. This is the only machine that needs Ansible installed.

The managed nodes (also called “hosts” or “targets”) are the machines Ansible configures. They run nothing belonging to Ansible — no agent, no daemon, no persistent process. A managed node needs only two things: a way for the control node to log in (SSH for Linux/Unix, WinRM or SSH for Windows) and, for Linux targets, a Python interpreter so Ansible can execute its modules there. That is the whole footprint.

Here is the sequence of what actually happens when you run a task against a Linux host — internalise this, because almost every behaviour and failure mode in Ansible follows from it:

  1. On the control node, Ansible reads your play, works out the target hosts from the inventory, and gathers the variables that apply.
  2. For each task, Ansible takes the named module (a small program, usually Python) and, with your arguments baked in, transfers it to the managed node over the connection (SSH by default).
  3. The module runs on the managed node using that node’s Python interpreter. It does the work — or, crucially, checks whether the work is already done and does nothing if so — and prints a JSON result to standard output.
  4. Ansible on the control node reads that JSON back over the connection, decides whether the task reported ok, changed, or failed, removes the temporary module file from the target, and moves to the next task.

Two consequences are worth stating now. First, the real work happens on the target, not on the control node — the control node is an orchestrator that ships code and reads results. Second, because each module is copied, run, and cleaned up per task, there is no long-lived state on the target between runs; the desired state lives entirely in your files on the control node.

Component Where it runs What it is / does Must Ansible be installed there?
Control node Your laptop / jump host / CI / AAP Runs ansible/ansible-playbook; holds inventory, playbooks, roles, collections; orchestrates everything. Yes
Managed node Each target server / device Gets modules pushed to it, executes them, returns JSON. Runs no Ansible agent. No
Inventory Read on the control node The list of managed nodes and their groups/variables. n/a
Connection plugin Control node initiates The transport: ssh (Linux default), winrm/psrp (Windows), local, docker, etc. n/a
Module Copied to and executed on the managed node The unit of work (install a package, copy a file). Returns JSON. n/a
Python on target Managed node (Linux/Unix) Interpreter that runs the pushed modules. (Windows uses PowerShell modules instead.) Python, yes; Ansible, no

Agentless: what it means and why it matters

Agentless” is the headline word in every Ansible introduction, and it means precisely this: you do not install or run any Ansible software on the machines you manage. Compare that with the agent-based model of classic Puppet and Chef, where every managed machine runs a daemon (puppet-agent, chef-client) that must be installed, configured with certificates, kept running, upgraded, and monitored for health.

Being agentless buys you several things:

What “agentless” does not mean: it does not mean “no dependencies on the target.” Linux targets still need a Python interpreter (Ansible discovers it automatically — /usr/bin/python3 on modern distributions). Windows targets need PowerShell and a configured WinRM (or SSH) listener. And the control node, of course, does need Ansible installed. The distinction is that these are either already present (Python on Linux) or are standard OS features (PowerShell/WinRM on Windows) — not a piece of Ansible-specific software you must deploy and maintain.

The transports: SSH and WinRM

Ansible reaches managed nodes through connection plugins. The two you must know:

Target Default connection Transport detail Auth options What runs there
Linux / Unix ssh OpenSSH (the same ssh you use by hand), default port 22 SSH keys (recommended), password, agent forwarding, certificates Python modules
Windows winrm (or psrp, or ssh) WinRM (Windows Remote Management) over HTTP 5985 / HTTPS 5986; psrp is a newer PowerShell Remoting transport NTLM, Kerberos, CredSSP, basic, certificate PowerShell modules
Local control node itself local No network — runs directly on the control node n/a Whatever the module needs
Containers / k8s / network gear docker, kubectl, network_cli, httpapi, … Specialised plugins per platform per plugin per platform

For Linux you should always prefer SSH key-based authentication over passwords, and you will typically pair it with privilege escalation (become, which uses sudo by default) so you can connect as an unprivileged user and elevate only where needed. For Windows, WinRM over HTTPS with Kerberos is the production-grade choice. The later installation lesson sets these up step by step.

The push model versus pull

Ansible uses a push model: an operator (or a pipeline) runs Ansible on the control node, and Ansible pushes the configuration out to the targets right now. You decide when changes happen; nothing runs on a schedule unless you arrange it.

Classic Puppet and Chef default to a pull model: each managed node runs an agent that, on a timer (typically every 30 minutes), pulls its catalogue from a central server (a Puppet master / Chef server) and applies it locally. The node is responsible for keeping itself in line.

Neither is universally “better” — they trade off differently:

Dimension Push (Ansible) Pull (Puppet/Chef default)
Who initiates The operator/pipeline, from the control node The node’s agent, on a schedule
When changes apply Exactly when you run it — immediate, intentional Within the next check-in interval (eventual)
Central server None required A master/server the nodes depend on
Continuous drift correction Only when you run (you can schedule it, e.g. via cron/AAP) Automatic and continuous by design
Bootstrap Trivial — SSH + Python is enough Must get the agent onto the node first
Scale ceiling Bounded by forks and control-node resources; very high with AAP/pull-mode Scales by adding server capacity; nodes self-serve
Failure visibility Immediate and in your face Logged centrally; a silent agent can hide

Two important nuances. First, Ansible can do pull-style operation with ansible-pull, where each node clones a Git repo and runs a playbook against itself on a schedule — useful for large, ephemeral or immutable fleets. Second, Ansible Automation Platform adds scheduling, so even in the standard push model you can have changes applied on a cadence with central logging — giving you the continuous-drift-correction benefit without a per-node agent.

Idempotency: the property that makes it safe

Idempotency is the single most important idea in Ansible. An operation is idempotent if applying it once or applying it many times produces the same result — the second and subsequent runs make no further changes because the desired state is already in place.

This flows directly from Ansible being declarative-ish: you describe the desired state (“the package nginx should be present”, “the service nginx should be started and enabled”, “this config file should have these contents”), and each module checks the current state first and only acts if reality differs from the declaration. Run the playbook against a fresh server and it installs, copies, and starts things — lots of changed. Run the very same playbook again a minute later and, if nothing has drifted, it makes no changes at all — everything reports ok. That property is what lets you run a playbook in production with confidence: it is not a script that blindly re-does work and risks breaking things; it is a reconciliation to a goal.

Contrast two ways of opening a port in a config file. The non-idempotent way is ansible.builtin.shell: echo "Listen 8080" >> /etc/httpd/conf/httpd.conf — run it three times and the line appears three times. The idempotent way is ansible.builtin.lineinfile, which ensures the line is present exactly once: run it any number of times and the file ends up identical. Most Ansible modules are written to be idempotent like this; the handful that cannot be (command, shell, raw) are exactly the ones you must use with care, because Ansible cannot know whether their effect is already in place.

This is what the changed-versus-ok result model communicates on every run:

Result Symbol/colour Meaning
ok green The module ran and found reality already matched the desired state — no change made.
changed yellow The module made a change to bring reality into line with the desired state.
failed red The task errored (e.g. package not found, permission denied). By default the host stops here.
skipped cyan The task was not run because its when condition was false.
unreachable red Ansible could not connect to the host at all (SSH refused, host down) — distinct from a task failure.

Every playbook ends with a play recap tallying these per host, e.g. host1 : ok=7 changed=2 unreachable=0 failed=0 skipped=1. The mark of a healthy, converged system is a run where everything is ok and changed=0 — proof that the machine already matches your declared state. Watching changed drop to zero on the second run is how you know a playbook is idempotent. (A small caveat: a few read-only commands report changed even though they alter nothing, because Ansible can’t tell; you fix that cosmetically with changed_when: false, covered in the error-handling lesson.)

The building blocks: inventory, modules, plugins, playbooks, roles, collections

Six concepts make up everything you will write and use in Ansible. Here they are top to bottom; each gets its own deep-dive lesson later, but you need the map now.

Building block What it is Example Deep-dive lesson
Inventory The list of managed nodes, organised into groups, with variables attached to hosts and groups. Static (INI/YAML files) or dynamic (a plugin that queries the cloud). web01, web02 in a [web] group Ansible Inventory, In Depth
Module A small, usually idempotent program that does one unit of work on a target and returns JSON. The verbs of Ansible. ansible.builtin.copy, ansible.builtin.service Ad-Hoc Commands & Modules
Plugin Code that extends the Ansible engine itself (not the work done on targets): connection, lookup, filter, callback, inventory, become, cache plugins, and more. ssh connection plugin, to_json filter woven throughout
Playbook A YAML file of one or more plays; each play maps a group of hosts to an ordered list of tasks (each task calls a module). The script of desired state. site.yml Playbooks, In Depth
Role A standardised, reusable directory bundling tasks, handlers, templates, files, variables and defaults so a unit of configuration (e.g. “nginx”) can be shared and parameterised. roles/nginx/ Roles & Collections
Collection The modern distribution format: a versioned package bundling modules, plugins, roles and playbooks under a namespace.name, installed from Ansible Galaxy or Automation Hub. community.general, amazon.aws Roles & Collections

A quick word on plugins versus modules, because newcomers blur them: a module runs on the managed node to change the world there and returns JSON (it is the work). A plugin runs on the control node and extends Ansible’s own behaviour — how it connects (connection plugins), how it transforms data in templates (filter plugins like default, to_yaml), how it fetches values (lookup plugins like file, env), how it formats output (callback plugins), how it discovers hosts (inventory plugins), and how it escalates privilege (become plugins). Modules are what Ansible does to your servers; plugins are how Ansible itself works.

FQCN: namespace.collection.module

Since collections became the standard distribution unit, every module, plugin and role has a Fully Qualified Collection Name (FQCN) of the form namespace.collection.module. The built-in modules that ship inside ansible-core live in the ansible.builtin collection, so the ping module’s FQCN is ansible.builtin.ping, the copy module is ansible.builtin.copy, and so on. Modules from other collections follow the same pattern: community.general.timezone, amazon.aws.ec2_instance, ansible.posix.firewalld, community.docker.docker_container.

FQCN namespace collection module Ships in
ansible.builtin.ping ansible builtin ping ansible-core
ansible.builtin.copy ansible builtin copy ansible-core
community.general.timezone community general timezone community.general collection
amazon.aws.ec2_instance amazon aws ec2_instance amazon.aws collection
ansible.posix.firewalld ansible posix firewalld ansible.posix collection

You will see older playbooks and tutorials use the short name (copy, service, ping) without the namespace. That still resolves for built-ins, but using the full FQCN everywhere is the current best practice and is expected in the RHCE exam and in this course. It removes ambiguity (two collections could define a user module), it documents exactly which collection a task depends on, and it future-proofs your playbooks against name collisions. We use FQCN throughout this entire track.

ansible vs ansible-core vs AAP

These three names trip up nearly everyone, so let us be exact.

Name What it is Install with Contains Who it’s for
ansible-core The engine: the ansible, ansible-playbook, ansible-galaxy, ansible-doc, ansible-config, ansible-vault and related binaries, plus the ansible.builtin collection only. Currently 2.17+ (2026). pip install ansible-core The CLI tools + built-in modules/plugins Minimalists; CI images; when you manage collections yourself
ansible (the community package) A batteries-included bundle: ansible-core plus a curated set of ~70 widely used collections (community.general, ansible.posix, amazon.aws, and more), versioned 10+ (Ansible 10 = ansible-core 2.17). pip install ansible Engine + many collections Most people getting started; workstations
Ansible Automation Platform (AAP) Red Hat’s commercial product. Adds a web UI and API (the old “Tower”/AWX lineage as automation controller), RBAC, scheduling, credential management, job logging/auditing, execution environments (containerised runtimes), private Automation Hub, and Event-Driven Ansible. A subscription/product, not pip The whole engine plus enterprise control plane Teams/enterprises needing governance, self-service and scale

The mental model: ansible-core is the engine, ansible is the engine plus a sensible toolbox of collections, and AAP is the engine wrapped in an enterprise control plane. For learning and for this course, installing the ansible package (or ansible-core and adding collections as you need them) is exactly right. AWX is the free, upstream, community version of the automation controller if you want to explore the platform layer without a subscription.

Declarative-ish: where Ansible sits between declarative and imperative

You will hear Ansible called “declarative”, and you will also hear pedants object that it is “really procedural”. Both have a point, which is why “declarative-ish” is the honest label.

So Ansible blends both: declarative goals inside an imperatively ordered list of steps. This is a genuine strength for configuration management and orchestration, where order frequently matters (install the package before templating its config before starting the service, and restart the service only after the config changed). The trade-off is that the burden of getting the order right — and of choosing idempotent modules over raw shell — sits with you, the author.

Ansible vs Terraform vs Puppet/Chef/Salt

Knowing where Ansible fits among its neighbours is a classic interview topic and a real architectural decision. The headline distinction is configuration management (what’s inside a server) versus provisioning/Infrastructure as Code (the servers and cloud resources themselves). Ansible can do both, but it is strongest at the former; Terraform is purpose-built for the latter.

Tool Primary job Model Agent? Language State file? Sweet spot
Ansible Configuration management, app deployment, orchestration, ad-hoc ops (also provisions) Push, declarative-ish (procedural order, idempotent tasks) No (agentless) YAML No (queries live state each run) Configuring & orchestrating existing servers; multi-step rollouts; “do this across N hosts now”
Terraform Provisioning / IaC — create & manage cloud/infra resources Pull/plan, declarative (graph-based, computes order) No HCL Yes (the source of truth for mappings) Standing up VMs, networks, databases, DNS across clouds
Puppet Configuration management (continuous enforcement) Pull, declarative (DSL describes desired state) Yes (puppet-agent) Puppet DSL (Ruby-based) Server-side catalogue Large, long-lived fleets needing constant drift correction
Chef Configuration management (continuous enforcement) Pull, imperative-ish (Ruby “recipes”) Yes (chef-client) Ruby DSL Server-side Teams comfortable in Ruby wanting programmatic config
Salt Configuration management + remote execution at scale Both push & pull (fast ZeroMQ message bus; agent salt-minion or agentless salt-ssh) Optional (usually yes) YAML + Jinja Minion-side Very large fleets needing fast, event-driven remote execution

The practical guidance experienced teams follow:

For the full decision framework on picking between these for provisioning and configuration across many environments, see Terraform vs Terragrunt vs Ansible vs Pulumi: Which IaC Tool, When?.

Control-node requirements

Because everything runs from the control node, it is worth knowing exactly what it needs — and its one notable limitation.

Requirement Detail
Operating system A POSIX system: Linux (any major distro), macOS, or WSL on Windows. Ansible’s control node is not supported natively on Windows — use WSL or a Linux VM. (Windows is fully supported as a managed node.)
Python The control node needs Python 3 (modern ansible-core requires a reasonably recent Python 3.x; check the version matrix for your ansible-core release). pip/pipx/a virtual environment is the usual install path.
Network access to targets Outbound SSH (22) to Linux hosts and/or WinRM (5985/5986) to Windows hosts — directly or via a bastion.
Credentials SSH keys (or passwords) for Linux; WinRM credentials for Windows; and a privilege-escalation method (sudo, etc.) where root-level changes are needed.
Ansible installed ansible-core or the ansible package — only here, never on the targets.

The headline takeaway: the control node is the only thing you install and maintain Ansible on, and it must be POSIX (Linux/macOS/WSL), not native Windows. The next lesson sets one up properly.

Ansible control-node and managed-node architecture: how a task flows over SSH/WinRM, runs as a pushed module, and returns JSON

The diagram shows the control node holding your inventory, playbooks and collections, pushing a module over SSH to each Linux managed node (and over WinRM to a Windows one), the module executing on the target’s own Python/PowerShell, and the JSON result flowing back so Ansible can mark the task ok, changed or failed — the whole agentless push loop on one page.

Hands-on lab

We will do this entirely free using only your own machine plus a couple of throwaway containers as managed nodes — no cloud account, no cost. You need Linux or macOS (or WSL) with Ansible and Docker installed. (Installing Ansible properly is the next lesson; if you do not have it yet, pipx install ansible or pip install ansible will do for this lab.)

The goal is to see the architecture and idempotency with your own eyes: a control node (your machine), managed nodes (containers), the agentless SSH push, and the same playbook reporting changed the first time and ok the second.

1. Confirm the control node. Run ansible --version. Expected: a banner showing ansible [core 2.17.x] (or newer), the config file path, and the Python version. This machine is your control node.

2. Stand up two managed nodes as containers. We use a small image that has SSH and Python. In a terminal:

# Two Ubuntu containers with SSH running, mapped to host ports 2221 and 2222
for n in 1 2; do
  docker run -d --name node$n -p 222$n:22 \
    rastasheep/ubuntu-sshd:18.04
done

These containers each run an SSH server (user root, password root on this public test image) and ship Python — i.e. they are valid managed nodes with no Ansible installed on them.

3. Write a tiny inventory. Create inventory.ini:

[web]
node1 ansible_host=127.0.0.1 ansible_port=2221
node2 ansible_host=127.0.0.1 ansible_port=2222

[web:vars]
ansible_user=root
ansible_password=root
ansible_ssh_common_args=-o StrictHostKeyChecking=no
ansible_python_interpreter=/usr/bin/python3

This declares a group web with two hosts and the connection variables for reaching them. (Passwords inline are only acceptable here because these are disposable local containers — real systems use SSH keys and Vault, covered later.)

4. Prove the agentless connection. Run an ad-hoc ping (the ansible.builtin.ping module is not an ICMP ping — it confirms Ansible can connect and run Python on the target):

ansible web -i inventory.ini -m ansible.builtin.ping

Expected output, per host:

node1 | SUCCESS => {
    "ansible_facts": {"discovered_interpreter_python": "/usr/bin/python3"},
    "changed": false,
    "ping": "pong"
}
node2 | SUCCESS => { ... "ping": "pong" }

"ping": "pong" from both proves the full loop: SSH connection, module pushed and run on the target’s Python, JSON returned. Note "changed": false — ping changes nothing.

5. Write a small idempotent playbook. Create site.yml:

---
- name: Configure web nodes
  hosts: web
  become: true
  tasks:
    - name: Ensure the curl package is present
      ansible.builtin.package:
        name: curl
        state: present

    - name: Drop a managed marker file
      ansible.builtin.copy:
        content: "Managed by Ansible on {{ inventory_hostname }}\n"
        dest: /etc/kloudvin.txt
        mode: "0644"

Every module here is referenced by its FQCN (ansible.builtin.package, ansible.builtin.copy) and declares a desired state (state: present; a file with this exact content) — so it is idempotent.

6. Run it the first time (watch for changed).

ansible-playbook -i inventory.ini site.yml

Expected: both tasks report changed (yellow) on both hosts — curl gets installed, the file gets created — and a play recap like:

PLAY RECAP *********************************************************
node1 : ok=3  changed=2  unreachable=0  failed=0  skipped=0
node2 : ok=3  changed=2  unreachable=0  failed=0  skipped=0

(ok=3 includes the implicit fact-gathering task.)

7. Run the exact same playbook again (watch changed drop to zero).

ansible-playbook -i inventory.ini site.yml

Expected: every task now reports ok (green) and the recap shows changed=0:

node1 : ok=3  changed=0  unreachable=0  failed=0  skipped=0
node2 : ok=3  changed=0  unreachable=0  failed=0  skipped=0

This is idempotency made visible — the desired state already matched reality, so Ansible did nothing. This is the single most important thing to feel in this lesson.

8. (Optional) Preview-only mode. Run ansible-playbook -i inventory.ini site.yml --check --diff. --check is a dry run that predicts changes without making them, and --diff shows the textual difference for file changes — your safety net before touching anything real.

Validation. You have a control node (your machine) managing two agentless nodes (containers) over SSH, a successful ping/pong, and a playbook that proved idempotent by going from changed=2 to changed=0 on a second run — the whole architecture exercised end to end.

Cleanup. Remove the containers and the lab files:

docker rm -f node1 node2
rm -f inventory.ini site.yml

Cost note. This lab runs entirely on your own machine with local containers — it provisions no cloud resources and costs ₹0.

Common mistakes & troubleshooting

Symptom Cause Fix
UNREACHABLE! ... Failed to connect to the host via ssh Wrong host/port/user, SSH not running on the target, or firewall blocking 22 Verify you can ssh to the host by hand first; check ansible_host/ansible_port/ansible_user; confirm the SSH service is up. Unreachable ≠ a task failure.
/usr/bin/python: not found or interpreter warnings The target’s Python isn’t where Ansible looked Set ansible_python_interpreter=/usr/bin/python3 (host/group var). Modern ansible-core auto-discovers, but minimal images may need this.
Permission denied when a task needs root You connected as a non-root user without privilege escalation Add become: true to the play/task (uses sudo by default); pass --ask-become-pass if sudo needs a password.
Running ansible on Windows fails to install/run The control node isn’t supported natively on Windows Use WSL or a Linux VM as the control node; Windows is fine as a managed node via WinRM.
A shell/command task always shows changed command/shell/raw can’t know if their effect already exists, so they assume a change Prefer an idempotent module (lineinfile, copy, package); for genuinely read-only commands set changed_when: false.
couldn't resolve module/action 'community.general.x' The collection providing that FQCN isn’t installed Install it: ansible-galaxy collection install community.general (and list with ansible-galaxy collection list).
Inventory “host not found” / empty host list Wrong inventory path, or you didn’t pass -i, or the pattern matched nothing Pass -i inventory.ini; verify with ansible-inventory -i inventory.ini --list.
Host-key prompt hangs an automated run First-time SSH host-key verification is interactive For throwaway labs, -o StrictHostKeyChecking=no; for real hosts, pre-populate known_hosts (never disable verification in production).

Best practices

Security notes

Because Ansible is agentless and rides on SSH/WinRM, its security is your remote-access security — which is mostly good news (no extra daemon to harden) but puts the weight on credentials and connection hygiene.

Interview & exam questions

1. What is Ansible, and what problem does it solve? Ansible is an open-source automation engine used mainly for configuration management — bringing servers to a declared state and keeping them there — as well as deployment, orchestration and ad-hoc administration. It solves drift (machines becoming inconsistent), scale (managing many hosts at once), and lack of auditability, by letting you describe desired state in version-controlled YAML and push it to targets repeatably.

2. Explain Ansible’s architecture. There are two roles: the control node, where Ansible is installed and runs (holding inventory, playbooks, roles/collections), and the managed nodes, the targets, which run no Ansible agent. For each task, the control node pushes a module over SSH (Linux) or WinRM (Windows); the module executes on the target’s own Python (or PowerShell), returns JSON, and Ansible reads the result and cleans up. The real work happens on the target; the control node orchestrates.

3. What does “agentless” mean, and why is it an advantage? It means nothing belonging to Ansible runs on the managed nodes — no daemon to install, patch, secure or monitor. Advantages: no bootstrap problem (SSH + Python is enough), smaller attack surface, no “is the agent alive?” failure mode, and reuse of existing SSH access. The trade-off is a dependency on Python (Linux) or PowerShell/WinRM (Windows) being present, but those are standard, not Ansible-specific software.

4. Push vs pull — where does Ansible sit, and what’s the difference? Ansible defaults to push: an operator/pipeline runs it from the control node and changes apply immediately, with no central server the nodes depend on. Puppet/Chef default to pull: each node’s agent fetches and applies its catalogue from a central server on a schedule, giving continuous drift correction but requiring an agent and a server. Ansible can do pull-style work with ansible-pull, and AAP adds scheduling.

5. Define idempotency and explain how Ansible achieves it. Idempotency means running an operation once or many times yields the same result — re-runs make no further changes. Ansible achieves it because most modules are declarative: they check current state and act only if reality differs from the desired state you declared. So a playbook reports many changed on a fresh host and changed=0 on a converged one.

6. What do ok, changed, failed, skipped and unreachable mean in a play recap? ok = ran and reality already matched (no change); changed = a change was made to reach the desired state; failed = the task errored (the host stops by default); skipped = a when condition was false so it didn’t run; unreachable = Ansible couldn’t connect at all (distinct from a task failure). A healthy converged run is all ok with changed=0.

7. What is FQCN and why use it? Fully Qualified Collection Name — namespace.collection.module, e.g. ansible.builtin.copy or community.general.timezone. Built-ins live in ansible.builtin. Using FQCN removes ambiguity between collections that might define the same module name, documents which collection a task depends on, future-proofs against collisions, and is the current best practice and RHCE expectation.

8. Distinguish ansible, ansible-core, and AAP. ansible-core is the engine (the CLI binaries plus the ansible.builtin collection), on the 2.17+ line. The ansible package is ansible-core plus a curated bundle of ~70 collections (version 10+). Ansible Automation Platform is Red Hat’s commercial product adding a web UI/API (automation controller), RBAC, scheduling, credential management, execution environments, private Automation Hub and Event-Driven Ansible. AWX is the free upstream of the controller.

9. Is Ansible declarative or imperative? “Declarative-ish.” Individual tasks are declarative (they state a desired end state and the module decides whether to act, giving idempotency), but a playbook is an imperatively ordered, top-to-bottom list of tasks where you control the sequence. So it blends declarative goals inside procedural ordering.

10. When would you choose Ansible over Terraform, and how do they work together? Choose Ansible to configure what’s inside servers and to orchestrate multi-step operations; choose Terraform to provision the infrastructure itself (it has a state file and a dependency graph for resource lifecycle). They compose: Terraform creates the VMs and outputs IPs, Ansible (via dynamic inventory) configures the software on them. Ansible can provision too, but Terraform’s state/plan model is the better fit for serious infrastructure lifecycle.

11. What is a module versus a plugin? A module runs on the managed node to do a unit of work and returns JSON (it’s the work Ansible performs on your servers). A plugin runs on the control node and extends Ansible’s own behaviour — connection, lookup, filter, callback, inventory, become and cache plugins. Modules change your servers; plugins change how Ansible works.

12. What are the control-node requirements, and can it run on Windows? The control node needs a POSIX OS (Linux, macOS, or WSL), Python 3, network access to the targets (SSH 22 and/or WinRM 5985/5986), credentials, and Ansible installed (only here). It cannot run natively on Windows — use WSL or a Linux VM. Windows is fully supported as a managed node.

Quick check

  1. On which machine(s) is Ansible itself installed — the control node, the managed nodes, or both?
  2. Ansible’s default connection to a Linux host uses which transport, and to a Windows host?
  3. You run the same playbook twice; the second recap shows changed=0. What property does that demonstrate?
  4. What is the FQCN of the built-in module that copies a file, and which collection does it live in?
  5. True or false: ansible-core includes all the community collections you will ever need.

Answers

  1. Only the control node. Managed nodes run no Ansible agent — that’s the agentless model. They need SSH/WinRM access and (for Linux) Python.
  2. SSH (port 22) for Linux; WinRM (5985/5986), or alternatively SSH/psrp, for Windows.
  3. Idempotency — reality already matched the declared desired state, so Ansible made no changes. The second run being all ok with changed=0 is how you prove a playbook is idempotent.
  4. ansible.builtin.copy, which lives in the ansible.builtin collection that ships inside ansible-core.
  5. False. ansible-core ships only ansible.builtin. The ansible package adds a curated bundle, and you install any others with ansible-galaxy collection install.

Exercise

Cement the architecture and idempotency in your own words and hands.

  1. Reproduce the lab, but add a third task to site.yml using ansible.builtin.service to ensure a service (e.g. cron) is state: started and enabled: true. Run the playbook, note the result of the new task, then run it a second time and confirm it flips from changed to ok.
  2. Deliberately break idempotency: add a task using ansible.builtin.shell to append a line to a file (echo "test" >> /tmp/notes.txt). Run the playbook three times and inspect /tmp/notes.txt inside a container (docker exec node1 cat /tmp/notes.txt). How many lines are there, and why? Now rewrite it with ansible.builtin.lineinfile and repeat — what’s different?
  3. Run ansible web -i inventory.ini -m ansible.builtin.setup and skim the output. These are facts — automatically gathered data about each managed node. Find the host’s OS family and default IPv4 address in the JSON.
  4. Write three or four sentences explaining, to a colleague who has only used Bash scripts, why the lineinfile version is safer to run repeatedly than the shell version — using the words desired state, idempotent, and changed.

This shell-versus-module contrast is one of the most common interview discriminators for Ansible, and feeling it first-hand is worth more than reading about it.

Certification mapping

This lesson maps to the Red Hat Certified Engineer (RHCE) EX294 exam — Red Hat Certified Engineer in Red Hat Enterprise Linux, the target credential for the whole Ansible track. EX294 is a hands-on, practical exam (you automate tasks on live systems with Ansible, no multiple choice). This lesson grounds the foundational objectives it assumes throughout: understand core components of Ansible (the control node/managed-node architecture, inventory, modules, plugins, playbooks, roles, collections), understand the agentless and idempotent execution model (which underpins why the graders re-run your playbooks and expect them not to fail or needlessly change), and the use of FQCN and core terminology that the rest of the exam (and this course) takes for granted. The concrete skills the exam tests — writing inventories, running ad-hoc commands and playbooks, using become, variables, conditionals, loops, templates, roles and Vault — are each built in the lessons that follow.

Glossary

Next steps

You now understand what Ansible is and why it won, its agentless control-node/managed-node architecture, the push model versus pull, idempotency and the changed-vs-ok result model, the six building blocks and FQCN, the ansible/ansible-core/AAP distinction, and where Ansible sits among Terraform, Puppet, Chef and Salt. The natural next move is to get a control node working on your own machine and make a real connection. Continue with Installing & Configuring Ansible: the Control Node, ansible.cfg & Your First Connection, which installs Ansible properly (pip vs distro vs ansible vs ansible-core, pipx and virtual environments), sets up SSH keys and the managed-node requirements, walks the ansible.cfg configuration search order and every key setting, and runs your first ansible all -m ansible.builtin.ping against real hosts.

AnsibleConfiguration ManagementIdempotencyAgentlessAutomationDevOps
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments