Ansible Fundamentals: Architecture, the Agentless Push Model & Idempotency

If you have ever configured a fleet of servers by SSH-ing into each one and running the same commands by hand, you already know the three problems Ansible exists to solve. The first is drift: server number seven got a slightly different package version because you were interrupted, and now “all the web servers are identical” is a hopeful fiction. The second is scale: what works for three machines is unbearable for three hundred, and impossible for three thousand. The third is memory: six months later nobody can say why a setting is the way it is, because the configuration lives only in the muscle memory of whoever last touched the box. Ansible answers all three by letting you describe the desired state of your machines in plain-text YAML files, keep those files in Git like any other code, and push that state out to as many machines as you like, repeatably and safely. That is configuration management, and Ansible is the tool that made it approachable for everyone — not just specialists.

This lesson is the on-ramp for the whole Ansible track. By the end you will understand what Ansible is and why it specifically won the configuration-management category; you will know its architecture cold — the difference between the control node and the managed nodes, what “agentless” really means, and how the connection works over SSH for Linux and WinRM for Windows; you will understand the push model and how it differs from the pull model used by Puppet and Chef; you will internalise idempotency — the single most important property in the tool — and the changed versus ok result model that flows from it; you will meet the six building blocks (inventory, modules, plugins, playbooks, roles, collections) and the FQCN naming scheme that ties them together; and you will be able to place Ansible precisely against Terraform, Puppet, Chef and Salt so you know when to reach for which. We are working with current Ansible throughout — ansible-core 2.17+ (the 2026 line) and the broader Ansible 10+ package — using the real ansible and ansible-playbook commands and real YAML.

Learning objectives

After working through this lesson you will be able to:

Explain what Ansible is, what configuration management means, and the specific reasons Ansible became the de-facto standard.
Describe Ansible’s architecture end to end: the control node, the managed nodes, the transport (SSH/WinRM), and the role of Python on the targets.
Explain what agentless means and why it matters, and articulate the push model versus the pull model of Puppet and Chef.
Define idempotency, explain why declaring desired state makes re-runs safe, and read the ok / changed / failed / skipped / unreachable result model.
Identify and describe the six building blocks — inventory, modules, plugins, playbooks, roles, collections — and use FQCN (namespace.collection.module) correctly.
Distinguish ansible (the package) from ansible-core from Ansible Automation Platform (AAP), and explain Ansible’s “declarative-ish” position between declarative and imperative.
Compare Ansible with Terraform, Puppet, Chef and Salt, and state the control-node requirements for running it.

Prerequisites

You need almost nothing to start. A working knowledge of the Linux command line, comfort with SSH (you should know what an SSH key pair is), and a text editor are enough. A basic familiarity with YAML helps but is not assumed — we keep the YAML in this lesson minimal and explain it as it appears. No prior Ansible, no programming background, and no configuration-management experience are required; every term is defined as it shows up. This is the first stop in the Ansible Zero-to-Hero ladder. If you have read the course’s Infrastructure as Code: Core Concepts lesson you will recognise idempotency and drift at a higher level; here we ground them in actual Ansible behaviour. Everything that follows in the track — installation and ansible.cfg, inventory, ad-hoc commands, playbooks, roles and Vault — builds directly on the mental models below. This lesson, and the whole track, maps to the Red Hat Certified Engineer (RHCE) EX294 exam.

What Ansible is, and why it won

Ansible is an open-source automation engine. Its most common job is configuration management — bringing servers to a known, declared state (packages installed, services running, files in place, users created) and keeping them there — but the same engine also does application deployment, orchestration (coordinating multi-tier rollouts and rolling restarts), provisioning (creating cloud resources via cloud modules), and ad-hoc administration (one-off commands across many hosts). You describe what you want in YAML, and Ansible makes the machines match.

Configuration management as a discipline is about eliminating snowflake servers — machines that have drifted into being subtly, undocumentedly unique. The cure is to treat the definition of a server the way you treat application source code: written in files, version-controlled, reviewed in pull requests, and applied by a tool rather than by hand. The payoff is reproducibility (stamp out identical machines on demand), auditability (every change is a reviewable diff with an author and timestamp), and recoverability (the files are the rebuild plan).

Ansible was created by Michael DeHaan in 2012, acquired by Red Hat in 2015, and has been the most widely adopted tool in its category for years. The reasons compounded:

Reason	What it means in practice
Agentless	Nothing to install, run, or patch on the machines you manage. If a host has SSH and Python, Ansible can manage it today — no daemon, no certificate dance, no bootstrap problem.
Low barrier to entry	Playbooks are YAML, not a bespoke programming language. A new engineer can read a playbook on day one and roughly understand it; Puppet’s and Chef’s DSLs have a steeper climb.
Push model	You run Ansible from one place and it reaches out to the targets. There is no central server the nodes must check in with, and no “is the agent healthy?” failure mode.
Huge module & collection ecosystem	Thousands of modules for Linux, Windows, network gear, every major cloud, databases and SaaS — most distributed as versioned collections on Ansible Galaxy and Automation Hub.
Idempotent by design	Modules converge to a desired state, so running a playbook repeatedly is safe — the core property that makes automation trustworthy.
Backed by Red Hat, with an enterprise tier	A clear path from the free CLI to Ansible Automation Platform for teams that need RBAC, a web UI, scheduling and auditing. Strong jobs market and certification (RHCE).

A note on names and versions you must know in 2026: the engine itself is ansible-core (the binaries and the small set of built-in modules), currently on the 2.17+ line. The thing most people pip install as ansible is a larger “batteries-included” package — ansible-core plus a curated bundle of ~70 popular collections — at version 10+ (the version numbers of the two diverged years ago; Ansible 10 ships ansible-core 2.17). At the top sits Ansible Automation Platform (AAP), Red Hat’s commercial product. We will pin these down precisely later in the lesson.

The architecture: control node and managed nodes

Ansible’s architecture is refreshingly small. There are exactly two kinds of machine, and only one of them has Ansible installed.

The control node is the machine where Ansible itself lives and runs — your laptop, a jump host, a CI runner, or an AAP server. It holds the inventory, the playbooks, the roles and collections, and the ansible/ansible-playbook binaries. This is the only machine that needs Ansible installed.

The managed nodes (also called “hosts” or “targets”) are the machines Ansible configures. They run nothing belonging to Ansible — no agent, no daemon, no persistent process. A managed node needs only two things: a way for the control node to log in (SSH for Linux/Unix, WinRM or SSH for Windows) and, for Linux targets, a Python interpreter so Ansible can execute its modules there. That is the whole footprint.

Here is the sequence of what actually happens when you run a task against a Linux host — internalise this, because almost every behaviour and failure mode in Ansible follows from it:

On the control node, Ansible reads your play, works out the target hosts from the inventory, and gathers the variables that apply.
For each task, Ansible takes the named module (a small program, usually Python) and, with your arguments baked in, transfers it to the managed node over the connection (SSH by default).
The module runs on the managed node using that node’s Python interpreter. It does the work — or, crucially, checks whether the work is already done and does nothing if so — and prints a JSON result to standard output.
Ansible on the control node reads that JSON back over the connection, decides whether the task reported ok, changed, or failed, removes the temporary module file from the target, and moves to the next task.

Two consequences are worth stating now. First, the real work happens on the target, not on the control node — the control node is an orchestrator that ships code and reads results. Second, because each module is copied, run, and cleaned up per task, there is no long-lived state on the target between runs; the desired state lives entirely in your files on the control node.

Component	Where it runs	What it is / does	Must Ansible be installed there?
Control node	Your laptop / jump host / CI / AAP	Runs `ansible`/`ansible-playbook`; holds inventory, playbooks, roles, collections; orchestrates everything.	Yes
Managed node	Each target server / device	Gets modules pushed to it, executes them, returns JSON. Runs no Ansible agent.	No
Inventory	Read on the control node	The list of managed nodes and their groups/variables.	n/a
Connection plugin	Control node initiates	The transport: `ssh` (Linux default), `winrm`/`psrp` (Windows), `local`, `docker`, etc.	n/a
Module	Copied to and executed on the managed node	The unit of work (install a package, copy a file). Returns JSON.	n/a
Python on target	Managed node (Linux/Unix)	Interpreter that runs the pushed modules. (Windows uses PowerShell modules instead.)	Python, yes; Ansible, no

Agentless: what it means and why it matters

“Agentless” is the headline word in every Ansible introduction, and it means precisely this: you do not install or run any Ansible software on the machines you manage. Compare that with the agent-based model of classic Puppet and Chef, where every managed machine runs a daemon (puppet-agent, chef-client) that must be installed, configured with certificates, kept running, upgraded, and monitored for health.

Being agentless buys you several things:

No bootstrap problem. With agent-based tools you face a chicken-and-egg: to manage a fresh machine you must first get the agent onto it (often via … some other automation). Ansible can manage a brand-new box the moment it has SSH and Python, which on virtually every Linux distribution is out of the box.
A smaller attack surface and less to patch. There is no extra long-running service on every host to secure and update — one fewer daemon listening, one fewer thing with a CVE.
No “is the agent alive?” failure class. An agent that has crashed, or whose certificate expired, silently stops applying config and you find out during an incident. Ansible’s connection either works at run time or fails loudly, immediately, in front of you.
Reuse of existing access. Ansible rides on the SSH (or WinRM) access your team already manages — the same keys, the same bastion, the same audited paths.

What “agentless” does not mean: it does not mean “no dependencies on the target.” Linux targets still need a Python interpreter (Ansible discovers it automatically — /usr/bin/python3 on modern distributions). Windows targets need PowerShell and a configured WinRM (or SSH) listener. And the control node, of course, does need Ansible installed. The distinction is that these are either already present (Python on Linux) or are standard OS features (PowerShell/WinRM on Windows) — not a piece of Ansible-specific software you must deploy and maintain.

The transports: SSH and WinRM

Ansible reaches managed nodes through connection plugins. The two you must know:

Target	Default connection	Transport detail	Auth options	What runs there
Linux / Unix	`ssh`	OpenSSH (the same `ssh` you use by hand), default port 22	SSH keys (recommended), password, agent forwarding, certificates	Python modules
Windows	`winrm` (or `psrp`, or `ssh`)	WinRM (Windows Remote Management) over HTTP 5985 / HTTPS 5986; `psrp` is a newer PowerShell Remoting transport	NTLM, Kerberos, CredSSP, basic, certificate	PowerShell modules
Local control node itself	`local`	No network — runs directly on the control node	n/a	Whatever the module needs
Containers / k8s / network gear	`docker`, `kubectl`, `network_cli`, `httpapi`, …	Specialised plugins per platform	per plugin	per platform

For Linux you should always prefer SSH key-based authentication over passwords, and you will typically pair it with privilege escalation (become, which uses sudo by default) so you can connect as an unprivileged user and elevate only where needed. For Windows, WinRM over HTTPS with Kerberos is the production-grade choice. The later installation lesson sets these up step by step.

The push model versus pull

Ansible uses a push model: an operator (or a pipeline) runs Ansible on the control node, and Ansible pushes the configuration out to the targets right now. You decide when changes happen; nothing runs on a schedule unless you arrange it.

Classic Puppet and Chef default to a pull model: each managed node runs an agent that, on a timer (typically every 30 minutes), pulls its catalogue from a central server (a Puppet master / Chef server) and applies it locally. The node is responsible for keeping itself in line.

Neither is universally “better” — they trade off differently:

Dimension	Push (Ansible)	Pull (Puppet/Chef default)
Who initiates	The operator/pipeline, from the control node	The node’s agent, on a schedule
When changes apply	Exactly when you run it — immediate, intentional	Within the next check-in interval (eventual)
Central server	None required	A master/server the nodes depend on
Continuous drift correction	Only when you run (you can schedule it, e.g. via cron/AAP)	Automatic and continuous by design
Bootstrap	Trivial — SSH + Python is enough	Must get the agent onto the node first
Scale ceiling	Bounded by `forks` and control-node resources; very high with AAP/pull-mode	Scales by adding server capacity; nodes self-serve
Failure visibility	Immediate and in your face	Logged centrally; a silent agent can hide

Two important nuances. First, Ansible can do pull-style operation with ansible-pull, where each node clones a Git repo and runs a playbook against itself on a schedule — useful for large, ephemeral or immutable fleets. Second, Ansible Automation Platform adds scheduling, so even in the standard push model you can have changes applied on a cadence with central logging — giving you the continuous-drift-correction benefit without a per-node agent.

Idempotency: the property that makes it safe

Idempotency is the single most important idea in Ansible. An operation is idempotent if applying it once or applying it many times produces the same result — the second and subsequent runs make no further changes because the desired state is already in place.

This flows directly from Ansible being declarative-ish: you describe the desired state (“the package nginx should be present”, “the service nginx should be started and enabled”, “this config file should have these contents”), and each module checks the current state first and only acts if reality differs from the declaration. Run the playbook against a fresh server and it installs, copies, and starts things — lots of changed. Run the very same playbook again a minute later and, if nothing has drifted, it makes no changes at all — everything reports ok. That property is what lets you run a playbook in production with confidence: it is not a script that blindly re-does work and risks breaking things; it is a reconciliation to a goal.

Contrast two ways of opening a port in a config file. The non-idempotent way is ansible.builtin.shell: echo "Listen 8080" >> /etc/httpd/conf/httpd.conf — run it three times and the line appears three times. The idempotent way is ansible.builtin.lineinfile, which ensures the line is present exactly once: run it any number of times and the file ends up identical. Most Ansible modules are written to be idempotent like this; the handful that cannot be (command, shell, raw) are exactly the ones you must use with care, because Ansible cannot know whether their effect is already in place.

This is what the changed-versus-ok result model communicates on every run:

Result	Symbol/colour	Meaning
ok	green	The module ran and found reality already matched the desired state — no change made.
changed	yellow	The module made a change to bring reality into line with the desired state.
failed	red	The task errored (e.g. package not found, permission denied). By default the host stops here.
skipped	cyan	The task was not run because its `when` condition was false.
unreachable	red	Ansible could not connect to the host at all (SSH refused, host down) — distinct from a task failure.

Every playbook ends with a play recap tallying these per host, e.g. host1 : ok=7 changed=2 unreachable=0 failed=0 skipped=1. The mark of a healthy, converged system is a run where everything is ok and changed=0 — proof that the machine already matches your declared state. Watching changed drop to zero on the second run is how you know a playbook is idempotent. (A small caveat: a few read-only commands report changed even though they alter nothing, because Ansible can’t tell; you fix that cosmetically with changed_when: false, covered in the error-handling lesson.)

The building blocks: inventory, modules, plugins, playbooks, roles, collections

Six concepts make up everything you will write and use in Ansible. Here they are top to bottom; each gets its own deep-dive lesson later, but you need the map now.

Building block	What it is	Example	Deep-dive lesson
Inventory	The list of managed nodes, organised into groups, with variables attached to hosts and groups. Static (INI/YAML files) or dynamic (a plugin that queries the cloud).	`web01`, `web02` in a `[web]` group	Ansible Inventory, In Depth
Module	A small, usually idempotent program that does one unit of work on a target and returns JSON. The verbs of Ansible.	`ansible.builtin.copy`, `ansible.builtin.service`	Ad-Hoc Commands & Modules
Plugin	Code that extends the Ansible engine itself (not the work done on targets): connection, lookup, filter, callback, inventory, become, cache plugins, and more.	`ssh` connection plugin, `to_json` filter	woven throughout
Playbook	A YAML file of one or more plays; each play maps a group of hosts to an ordered list of tasks (each task calls a module). The script of desired state.	`site.yml`	Playbooks, In Depth
Role	A standardised, reusable directory bundling tasks, handlers, templates, files, variables and defaults so a unit of configuration (e.g. “nginx”) can be shared and parameterised.	`roles/nginx/`	Roles & Collections
Collection	The modern distribution format: a versioned package bundling modules, plugins, roles and playbooks under a `namespace.name`, installed from Ansible Galaxy or Automation Hub.	`community.general`, `amazon.aws`	Roles & Collections

A quick word on plugins versus modules, because newcomers blur them: a module runs on the managed node to change the world there and returns JSON (it is the work). A plugin runs on the control node and extends Ansible’s own behaviour — how it connects (connection plugins), how it transforms data in templates (filter plugins like default, to_yaml), how it fetches values (lookup plugins like file, env), how it formats output (callback plugins), how it discovers hosts (inventory plugins), and how it escalates privilege (become plugins). Modules are what Ansible does to your servers; plugins are how Ansible itself works.

FQCN: namespace.collection.module

Since collections became the standard distribution unit, every module, plugin and role has a Fully Qualified Collection Name (FQCN) of the form namespace.collection.module. The built-in modules that ship inside ansible-core live in the ansible.builtin collection, so the ping module’s FQCN is ansible.builtin.ping, the copy module is ansible.builtin.copy, and so on. Modules from other collections follow the same pattern: community.general.timezone, amazon.aws.ec2_instance, ansible.posix.firewalld, community.docker.docker_container.

FQCN	namespace	collection	module	Ships in
`ansible.builtin.ping`	ansible	builtin	ping	ansible-core
`ansible.builtin.copy`	ansible	builtin	copy	ansible-core
`community.general.timezone`	community	general	timezone	`community.general` collection
`amazon.aws.ec2_instance`	amazon	aws	ec2_instance	`amazon.aws` collection
`ansible.posix.firewalld`	ansible	posix	firewalld	`ansible.posix` collection

You will see older playbooks and tutorials use the short name (copy, service, ping) without the namespace. That still resolves for built-ins, but using the full FQCN everywhere is the current best practice and is expected in the RHCE exam and in this course. It removes ambiguity (two collections could define a user module), it documents exactly which collection a task depends on, and it future-proofs your playbooks against name collisions. We use FQCN throughout this entire track.

ansible vs ansible-core vs AAP

These three names trip up nearly everyone, so let us be exact.

Name	What it is	Install with	Contains	Who it’s for
`ansible-core`	The engine: the `ansible`, `ansible-playbook`, `ansible-galaxy`, `ansible-doc`, `ansible-config`, `ansible-vault` and related binaries, plus the `ansible.builtin` collection only. Currently 2.17+ (2026).	`pip install ansible-core`	The CLI tools + built-in modules/plugins	Minimalists; CI images; when you manage collections yourself
`ansible` (the community package)	A batteries-included bundle: `ansible-core` plus a curated set of ~70 widely used collections (community.general, ansible.posix, amazon.aws, and more), versioned 10+ (Ansible 10 = ansible-core 2.17).	`pip install ansible`	Engine + many collections	Most people getting started; workstations
Ansible Automation Platform (AAP)	Red Hat’s commercial product. Adds a web UI and API (the old “Tower”/AWX lineage as automation controller), RBAC, scheduling, credential management, job logging/auditing, execution environments (containerised runtimes), private Automation Hub, and Event-Driven Ansible.	A subscription/product, not pip	The whole engine plus enterprise control plane	Teams/enterprises needing governance, self-service and scale

The mental model: ansible-core is the engine, ansible is the engine plus a sensible toolbox of collections, and AAP is the engine wrapped in an enterprise control plane. For learning and for this course, installing the ansible package (or ansible-core and adding collections as you need them) is exactly right. AWX is the free, upstream, community version of the automation controller if you want to explore the platform layer without a subscription.

Declarative-ish: where Ansible sits between declarative and imperative

You will hear Ansible called “declarative”, and you will also hear pedants object that it is “really procedural”. Both have a point, which is why “declarative-ish” is the honest label.

Declarative at the task level: a well-written task states a desired end state (“nginx should be state: started and enabled: true”), and the module figures out whether to act. You are not writing “if running, do nothing, else start it” — the module does that for you. This is what gives you idempotency.
Imperative/procedural at the playbook level: a playbook is an ordered list of tasks executed top to bottom. You control the sequence explicitly — task 2 runs after task 1 — which is the opposite of a purely declarative tool like Terraform that builds a dependency graph and decides the order itself.

So Ansible blends both: declarative goals inside an imperatively ordered list of steps. This is a genuine strength for configuration management and orchestration, where order frequently matters (install the package before templating its config before starting the service, and restart the service only after the config changed). The trade-off is that the burden of getting the order right — and of choosing idempotent modules over raw shell — sits with you, the author.

Ansible vs Terraform vs Puppet/Chef/Salt

Knowing where Ansible fits among its neighbours is a classic interview topic and a real architectural decision. The headline distinction is configuration management (what’s inside a server) versus provisioning/Infrastructure as Code (the servers and cloud resources themselves). Ansible can do both, but it is strongest at the former; Terraform is purpose-built for the latter.

Tool	Primary job	Model	Agent?	Language	State file?	Sweet spot
Ansible	Configuration management, app deployment, orchestration, ad-hoc ops (also provisions)	Push, declarative-ish (procedural order, idempotent tasks)	No (agentless)	YAML	No (queries live state each run)	Configuring & orchestrating existing servers; multi-step rollouts; “do this across N hosts now”
Terraform	Provisioning / IaC — create & manage cloud/infra resources	Pull/plan, declarative (graph-based, computes order)	No	HCL	Yes (the source of truth for mappings)	Standing up VMs, networks, databases, DNS across clouds
Puppet	Configuration management (continuous enforcement)	Pull, declarative (DSL describes desired state)	Yes (puppet-agent)	Puppet DSL (Ruby-based)	Server-side catalogue	Large, long-lived fleets needing constant drift correction
Chef	Configuration management (continuous enforcement)	Pull, imperative-ish (Ruby “recipes”)	Yes (chef-client)	Ruby DSL	Server-side	Teams comfortable in Ruby wanting programmatic config
Salt	Configuration management + remote execution at scale	Both push & pull (fast ZeroMQ message bus; agent `salt-minion` or agentless `salt-ssh`)	Optional (usually yes)	YAML + Jinja	Minion-side	Very large fleets needing fast, event-driven remote execution

The practical guidance experienced teams follow:

Use Terraform (or your cloud’s native IaC) to create the infrastructure, and Ansible to configure what’s on it. They compose cleanly: Terraform builds the VMs and outputs their IPs; Ansible reads those (via dynamic inventory) and installs and configures the software. This pairing is so common it is a pattern in its own right.
Choose Ansible over Puppet/Chef when you value the agentless model, want a low learning curve (YAML over a DSL), and prefer push with on-demand control — which is most teams today, and why Ansible overtook them.
Consider Salt when you need very fast remote execution across enormous fleets with an event-driven bus, and you are comfortable running minions.
Remember Ansible can provision too (it has rich cloud modules), so for smaller estates you might use Ansible alone end to end — but for serious infrastructure lifecycle management, Terraform’s state model and plan/graph are the better fit, which is exactly the decision the course’s Terraform vs Terragrunt vs Ansible vs Pulumi lesson walks through.

For the full decision framework on picking between these for provisioning and configuration across many environments, see Terraform vs Terragrunt vs Ansible vs Pulumi: Which IaC Tool, When?.

Control-node requirements

Because everything runs from the control node, it is worth knowing exactly what it needs — and its one notable limitation.

Requirement	Detail
Operating system	A POSIX system: Linux (any major distro), macOS, or WSL on Windows. Ansible’s control node is not supported natively on Windows — use WSL or a Linux VM. (Windows is fully supported as a managed node.)
Python	The control node needs Python 3 (modern `ansible-core` requires a reasonably recent Python 3.x; check the version matrix for your `ansible-core` release). `pip`/`pipx`/a virtual environment is the usual install path.
Network access to targets	Outbound SSH (22) to Linux hosts and/or WinRM (5985/5986) to Windows hosts — directly or via a bastion.
Credentials	SSH keys (or passwords) for Linux; WinRM credentials for Windows; and a privilege-escalation method (`sudo`, etc.) where root-level changes are needed.
Ansible installed	`ansible-core` or the `ansible` package — only here, never on the targets.

The headline takeaway: the control node is the only thing you install and maintain Ansible on, and it must be POSIX (Linux/macOS/WSL), not native Windows. The next lesson sets one up properly.

Ansible control-node and managed-node architecture: how a task flows over SSH/WinRM, runs as a pushed module, and returns JSON

The diagram shows the control node holding your inventory, playbooks and collections, pushing a module over SSH to each Linux managed node (and over WinRM to a Windows one), the module executing on the target’s own Python/PowerShell, and the JSON result flowing back so Ansible can mark the task ok, changed or failed — the whole agentless push loop on one page.

Hands-on lab

We will do this entirely free using only your own machine plus a couple of throwaway containers as managed nodes — no cloud account, no cost. You need Linux or macOS (or WSL) with Ansible and Docker installed. (Installing Ansible properly is the next lesson; if you do not have it yet, pipx install ansible or pip install ansible will do for this lab.)

The goal is to see the architecture and idempotency with your own eyes: a control node (your machine), managed nodes (containers), the agentless SSH push, and the same playbook reporting changed the first time and ok the second.

1. Confirm the control node. Run ansible --version. Expected: a banner showing ansible [core 2.17.x] (or newer), the config file path, and the Python version. This machine is your control node.

2. Stand up two managed nodes as containers. We use a small image that has SSH and Python. In a terminal:

# Two Ubuntu containers with SSH running, mapped to host ports 2221 and 2222
for n in 1 2; do
  docker run -d --name node$n -p 222$n:22 \
    rastasheep/ubuntu-sshd:18.04
done

These containers each run an SSH server (user root, password root on this public test image) and ship Python — i.e. they are valid managed nodes with no Ansible installed on them.

3. Write a tiny inventory. Create inventory.ini:

[web]
node1 ansible_host=127.0.0.1 ansible_port=2221
node2 ansible_host=127.0.0.1 ansible_port=2222

[web:vars]
ansible_user=root
ansible_password=root
ansible_ssh_common_args=-o StrictHostKeyChecking=no
ansible_python_interpreter=/usr/bin/python3

This declares a group web with two hosts and the connection variables for reaching them. (Passwords inline are only acceptable here because these are disposable local containers — real systems use SSH keys and Vault, covered later.)

4. Prove the agentless connection. Run an ad-hoc ping (the ansible.builtin.ping module is not an ICMP ping — it confirms Ansible can connect and run Python on the target):

ansible web -i inventory.ini -m ansible.builtin.ping

Expected output, per host:

node1 | SUCCESS => {
    "ansible_facts": {"discovered_interpreter_python": "/usr/bin/python3"},
    "changed": false,
    "ping": "pong"
}
node2 | SUCCESS => { ... "ping": "pong" }

"ping": "pong" from both proves the full loop: SSH connection, module pushed and run on the target’s Python, JSON returned. Note "changed": false — ping changes nothing.

5. Write a small idempotent playbook. Create site.yml:

---
- name: Configure web nodes
  hosts: web
  become: true
  tasks:
    - name: Ensure the curl package is present
      ansible.builtin.package:
        name: curl
        state: present

    - name: Drop a managed marker file
      ansible.builtin.copy:
        content: "Managed by Ansible on {{ inventory_hostname }}\n"
        dest: /etc/kloudvin.txt
        mode: "0644"

Every module here is referenced by its FQCN (ansible.builtin.package, ansible.builtin.copy) and declares a desired state (state: present; a file with this exact content) — so it is idempotent.

6. Run it the first time (watch for changed).

ansible-playbook -i inventory.ini site.yml

Expected: both tasks report changed (yellow) on both hosts — curl gets installed, the file gets created — and a play recap like:

PLAY RECAP *********************************************************
node1 : ok=3  changed=2  unreachable=0  failed=0  skipped=0
node2 : ok=3  changed=2  unreachable=0  failed=0  skipped=0

(ok=3 includes the implicit fact-gathering task.)

7. Run the exact same playbook again (watch changed drop to zero).

ansible-playbook -i inventory.ini site.yml

Expected: every task now reports ok (green) and the recap shows changed=0:

node1 : ok=3  changed=0  unreachable=0  failed=0  skipped=0
node2 : ok=3  changed=0  unreachable=0  failed=0  skipped=0

This is idempotency made visible — the desired state already matched reality, so Ansible did nothing. This is the single most important thing to feel in this lesson.

8. (Optional) Preview-only mode. Run ansible-playbook -i inventory.ini site.yml --check --diff. --check is a dry run that predicts changes without making them, and --diff shows the textual difference for file changes — your safety net before touching anything real.

Validation. You have a control node (your machine) managing two agentless nodes (containers) over SSH, a successful ping/pong, and a playbook that proved idempotent by going from changed=2 to changed=0 on a second run — the whole architecture exercised end to end.

Cleanup. Remove the containers and the lab files:

docker rm -f node1 node2
rm -f inventory.ini site.yml

Cost note. This lab runs entirely on your own machine with local containers — it provisions no cloud resources and costs ₹0.

Common mistakes & troubleshooting

Symptom	Cause	Fix
`UNREACHABLE! ... Failed to connect to the host via ssh`	Wrong host/port/user, SSH not running on the target, or firewall blocking 22	Verify you can `ssh` to the host by hand first; check `ansible_host`/`ansible_port`/`ansible_user`; confirm the SSH service is up. Unreachable ≠ a task failure.
`/usr/bin/python: not found` or interpreter warnings	The target’s Python isn’t where Ansible looked	Set `ansible_python_interpreter=/usr/bin/python3` (host/group var). Modern `ansible-core` auto-discovers, but minimal images may need this.
`Permission denied` when a task needs root	You connected as a non-root user without privilege escalation	Add `become: true` to the play/task (uses `sudo` by default); pass `--ask-become-pass` if sudo needs a password.
Running `ansible` on Windows fails to install/run	The control node isn’t supported natively on Windows	Use WSL or a Linux VM as the control node; Windows is fine as a managed node via WinRM.
A `shell`/`command` task always shows changed	`command`/`shell`/`raw` can’t know if their effect already exists, so they assume a change	Prefer an idempotent module (`lineinfile`, `copy`, `package`); for genuinely read-only commands set `changed_when: false`.
`couldn't resolve module/action 'community.general.x'`	The collection providing that FQCN isn’t installed	Install it: `ansible-galaxy collection install community.general` (and list with `ansible-galaxy collection list`).
Inventory “host not found” / empty host list	Wrong inventory path, or you didn’t pass `-i`, or the pattern matched nothing	Pass `-i inventory.ini`; verify with `ansible-inventory -i inventory.ini --list`.
Host-key prompt hangs an automated run	First-time SSH host-key verification is interactive	For throwaway labs, `-o StrictHostKeyChecking=no`; for real hosts, pre-populate `known_hosts` (never disable verification in production).

Best practices

Use FQCN everywhere (ansible.builtin.copy, not copy). It is unambiguous, self-documenting about dependencies, future-proof against name collisions, and expected by the RHCE exam.
Prefer idempotent modules over shell/command. Reach for package, service, copy, template, lineinfile before raw commands; if you must shell out, make it idempotent with creates/removes/changed_when.
Treat the second run as the test. A correct playbook reports changed=0 on a re-run against an unchanged host. If it keeps reporting changed, a task isn’t truly idempotent — fix it.
Connect as a normal user and escalate with become only where needed, rather than logging in as root. Pair with SSH keys, not passwords.
Keep the control node POSIX and version-pinned. Install in a virtual environment or with pipx so the Ansible version is explicit and reproducible across machines and CI.
Pin your collections. Declare them (with versions) in a requirements.yml so every control node and pipeline resolves the same modules — the dependency-management lesson covers this.
Use --check --diff as a dry run before applying anything to real systems, so you see what would change first.
Right tool for the layer: provision infrastructure with Terraform/native IaC, configure it with Ansible — don’t force one tool to do both badly.

Security notes

Because Ansible is agentless and rides on SSH/WinRM, its security is your remote-access security — which is mostly good news (no extra daemon to harden) but puts the weight on credentials and connection hygiene.

Use SSH key-based authentication, not passwords, for Linux targets, and protect the private keys (passphrase + an agent, or a hardware/HSM-backed key). For Windows, prefer WinRM over HTTPS with Kerberos over basic auth.
Never put real secrets in plaintext — not in inventory, not in playbooks, not in vars files committed to Git. Use Ansible Vault to encrypt secrets at rest (its own lesson) or pull them from an external secrets manager at run time.
Escalate least-privilege. Connect as an unprivileged user and use become (sudo) only for the tasks that need it; scope sudo rights tightly on the targets.
Keep host-key verification on for real systems. Disabling StrictHostKeyChecking (as we did for throwaway containers) removes protection against man-in-the-middle attacks — fine for a local lab, never for production.
Lock down the control node. It holds your inventory, playbooks, keys and Vault passwords — it is a high-value target. Restrict who can log in, and for teams move to AAP/AWX where credentials are stored centrally, injected at run time, and never exposed to operators.
Audit and review. Keep playbooks in Git with pull-request review, and (in AAP) use centralised job logging so every change to every host is attributable.

Interview & exam questions

1. What is Ansible, and what problem does it solve? Ansible is an open-source automation engine used mainly for configuration management — bringing servers to a declared state and keeping them there — as well as deployment, orchestration and ad-hoc administration. It solves drift (machines becoming inconsistent), scale (managing many hosts at once), and lack of auditability, by letting you describe desired state in version-controlled YAML and push it to targets repeatably.

2. Explain Ansible’s architecture. There are two roles: the control node, where Ansible is installed and runs (holding inventory, playbooks, roles/collections), and the managed nodes, the targets, which run no Ansible agent. For each task, the control node pushes a module over SSH (Linux) or WinRM (Windows); the module executes on the target’s own Python (or PowerShell), returns JSON, and Ansible reads the result and cleans up. The real work happens on the target; the control node orchestrates.

3. What does “agentless” mean, and why is it an advantage? It means nothing belonging to Ansible runs on the managed nodes — no daemon to install, patch, secure or monitor. Advantages: no bootstrap problem (SSH + Python is enough), smaller attack surface, no “is the agent alive?” failure mode, and reuse of existing SSH access. The trade-off is a dependency on Python (Linux) or PowerShell/WinRM (Windows) being present, but those are standard, not Ansible-specific software.

4. Push vs pull — where does Ansible sit, and what’s the difference? Ansible defaults to push: an operator/pipeline runs it from the control node and changes apply immediately, with no central server the nodes depend on. Puppet/Chef default to pull: each node’s agent fetches and applies its catalogue from a central server on a schedule, giving continuous drift correction but requiring an agent and a server. Ansible can do pull-style work with ansible-pull, and AAP adds scheduling.

5. Define idempotency and explain how Ansible achieves it. Idempotency means running an operation once or many times yields the same result — re-runs make no further changes. Ansible achieves it because most modules are declarative: they check current state and act only if reality differs from the desired state you declared. So a playbook reports many changed on a fresh host and changed=0 on a converged one.

6. What do ok, changed, failed, skipped and unreachable mean in a play recap? ok = ran and reality already matched (no change); changed = a change was made to reach the desired state; failed = the task errored (the host stops by default); skipped = a when condition was false so it didn’t run; unreachable = Ansible couldn’t connect at all (distinct from a task failure). A healthy converged run is all ok with changed=0.

7. What is FQCN and why use it? Fully Qualified Collection Name — namespace.collection.module, e.g. ansible.builtin.copy or community.general.timezone. Built-ins live in ansible.builtin. Using FQCN removes ambiguity between collections that might define the same module name, documents which collection a task depends on, future-proofs against collisions, and is the current best practice and RHCE expectation.

8. Distinguish ansible, ansible-core, and AAP. ansible-core is the engine (the CLI binaries plus the ansible.builtin collection), on the 2.17+ line. The ansible package is ansible-core plus a curated bundle of ~70 collections (version 10+). Ansible Automation Platform is Red Hat’s commercial product adding a web UI/API (automation controller), RBAC, scheduling, credential management, execution environments, private Automation Hub and Event-Driven Ansible. AWX is the free upstream of the controller.

9. Is Ansible declarative or imperative? “Declarative-ish.” Individual tasks are declarative (they state a desired end state and the module decides whether to act, giving idempotency), but a playbook is an imperatively ordered, top-to-bottom list of tasks where you control the sequence. So it blends declarative goals inside procedural ordering.

10. When would you choose Ansible over Terraform, and how do they work together? Choose Ansible to configure what’s inside servers and to orchestrate multi-step operations; choose Terraform to provision the infrastructure itself (it has a state file and a dependency graph for resource lifecycle). They compose: Terraform creates the VMs and outputs IPs, Ansible (via dynamic inventory) configures the software on them. Ansible can provision too, but Terraform’s state/plan model is the better fit for serious infrastructure lifecycle.

11. What is a module versus a plugin? A module runs on the managed node to do a unit of work and returns JSON (it’s the work Ansible performs on your servers). A plugin runs on the control node and extends Ansible’s own behaviour — connection, lookup, filter, callback, inventory, become and cache plugins. Modules change your servers; plugins change how Ansible works.

12. What are the control-node requirements, and can it run on Windows? The control node needs a POSIX OS (Linux, macOS, or WSL), Python 3, network access to the targets (SSH 22 and/or WinRM 5985/5986), credentials, and Ansible installed (only here). It cannot run natively on Windows — use WSL or a Linux VM. Windows is fully supported as a managed node.

Quick check

On which machine(s) is Ansible itself installed — the control node, the managed nodes, or both?
Ansible’s default connection to a Linux host uses which transport, and to a Windows host?
You run the same playbook twice; the second recap shows changed=0. What property does that demonstrate?
What is the FQCN of the built-in module that copies a file, and which collection does it live in?
True or false: ansible-core includes all the community collections you will ever need.

Answers

Only the control node. Managed nodes run no Ansible agent — that’s the agentless model. They need SSH/WinRM access and (for Linux) Python.
SSH (port 22) for Linux; WinRM (5985/5986), or alternatively SSH/psrp, for Windows.
Idempotency — reality already matched the declared desired state, so Ansible made no changes. The second run being all ok with changed=0 is how you prove a playbook is idempotent.
ansible.builtin.copy, which lives in the ansible.builtin collection that ships inside ansible-core.
False. ansible-core ships only ansible.builtin. The ansible package adds a curated bundle, and you install any others with ansible-galaxy collection install.

Exercise

Cement the architecture and idempotency in your own words and hands.

Reproduce the lab, but add a third task to site.yml using ansible.builtin.service to ensure a service (e.g. cron) is state: started and enabled: true. Run the playbook, note the result of the new task, then run it a second time and confirm it flips from changed to ok.
Deliberately break idempotency: add a task using ansible.builtin.shell to append a line to a file (echo "test" >> /tmp/notes.txt). Run the playbook three times and inspect /tmp/notes.txt inside a container (docker exec node1 cat /tmp/notes.txt). How many lines are there, and why? Now rewrite it with ansible.builtin.lineinfile and repeat — what’s different?
Run ansible web -i inventory.ini -m ansible.builtin.setup and skim the output. These are facts — automatically gathered data about each managed node. Find the host’s OS family and default IPv4 address in the JSON.
Write three or four sentences explaining, to a colleague who has only used Bash scripts, why the lineinfile version is safer to run repeatedly than the shell version — using the words desired state, idempotent, and changed.

This shell-versus-module contrast is one of the most common interview discriminators for Ansible, and feeling it first-hand is worth more than reading about it.

Certification mapping

This lesson maps to the Red Hat Certified Engineer (RHCE) EX294 exam — Red Hat Certified Engineer in Red Hat Enterprise Linux, the target credential for the whole Ansible track. EX294 is a hands-on, practical exam (you automate tasks on live systems with Ansible, no multiple choice). This lesson grounds the foundational objectives it assumes throughout: understand core components of Ansible (the control node/managed-node architecture, inventory, modules, plugins, playbooks, roles, collections), understand the agentless and idempotent execution model (which underpins why the graders re-run your playbooks and expect them not to fail or needlessly change), and the use of FQCN and core terminology that the rest of the exam (and this course) takes for granted. The concrete skills the exam tests — writing inventories, running ad-hoc commands and playbooks, using become, variables, conditionals, loops, templates, roles and Vault — are each built in the lessons that follow.

Glossary

Ansible — An open-source, agentless automation engine for configuration management, deployment, orchestration and ad-hoc administration.
Configuration management — The practice of defining and enforcing the desired state of servers (packages, services, files, users) as code, eliminating drift and snowflake servers.
Control node — The machine where Ansible is installed and runs, holding inventory, playbooks, roles and collections. The only machine with Ansible on it; must be POSIX (Linux/macOS/WSL).
Managed node — A target machine Ansible configures. Runs no Ansible agent; needs SSH/WinRM access and (on Linux) Python.
Agentless — Running automation with no software of the tool’s own installed or running on the managed nodes.
Push model — The operator/pipeline initiates the run from the control node and changes apply immediately (Ansible’s default), versus the pull model where node agents fetch config on a schedule (Puppet/Chef).
Idempotency — The property that applying an operation once or many times yields the same result; re-runs make no further changes because desired state already matches.
ok / changed / failed / skipped / unreachable — The per-task result states: no change needed / a change was made / errored / not run (condition false) / could not connect.
Inventory — The list of managed nodes, organised into groups with variables; static (INI/YAML) or dynamic (queried from a source).
Module — A small, usually idempotent program executed on the managed node that does one unit of work and returns JSON.
Plugin — Code that extends the Ansible engine itself (connection, lookup, filter, callback, inventory, become, cache), running on the control node.
Playbook — A YAML file of one or more plays, each mapping hosts to an ordered list of tasks.
Play — A single mapping of a group of hosts to a set of tasks (plus settings like become, gather_facts).
Task — One step in a play that invokes a module with arguments.
Role — A standardised, reusable directory bundling tasks, handlers, templates, files, vars and defaults for a unit of configuration.
Collection — The modern versioned distribution format bundling modules, plugins, roles and playbooks under a namespace.name; installed from Galaxy or Automation Hub.
FQCN — Fully Qualified Collection Name: namespace.collection.module (e.g. ansible.builtin.copy); the unambiguous, recommended way to reference content.
ansible-core — The Ansible engine: the CLI binaries plus the ansible.builtin collection (2.17+ in 2026).
ansible (package) — ansible-core plus a curated bundle of ~70 collections (version 10+).
Ansible Automation Platform (AAP) — Red Hat’s commercial product adding a web UI/API (automation controller), RBAC, scheduling, credentials, execution environments and Event-Driven Ansible; AWX is its free upstream.
become — Ansible’s privilege-escalation mechanism (sudo by default) to run tasks as another user, typically root.
Facts — Data automatically discovered about a managed node (OS, network, hardware) by the setup module, available as ansible_* variables.
WinRM — Windows Remote Management, the default transport Ansible uses to manage Windows nodes (HTTP 5985 / HTTPS 5986).

Next steps

You now understand what Ansible is and why it won, its agentless control-node/managed-node architecture, the push model versus pull, idempotency and the changed-vs-ok result model, the six building blocks and FQCN, the ansible/ansible-core/AAP distinction, and where Ansible sits among Terraform, Puppet, Chef and Salt. The natural next move is to get a control node working on your own machine and make a real connection. Continue with Installing & Configuring Ansible: the Control Node, ansible.cfg & Your First Connection, which installs Ansible properly (pip vs distro vs ansible vs ansible-core, pipx and virtual environments), sets up SSH keys and the managed-node requirements, walks the ansible.cfg configuration search order and every key setting, and runs your first ansible all -m ansible.builtin.ping against real hosts.