ansible-playbook from a developer’s laptop is automation. It is also, at scale, a problem: who ran it, against which inventory, with which credentials, when? Where do collections come from — and how do you trust them? When a webhook fires at 3 a.m., who runs the runbook? Ansible Automation Platform (AAP) is Red Hat’s answer: the supported, hardened, multi-component product that turns Ansible from “a CLI you trust your operators with” into “a platform your auditors trust.” AAP is not a different language — your roles, collections, and Execution Environments are unchanged — but it surrounds them with Controller (formerly Tower / upstream AWX: web UI + REST API + job execution at scale), Automation Hub (your private collection registry with content signing), and Event-Driven Ansible (EDA) (webhook/kafka/queue-driven autonomous response), all glued together by the Automation Mesh (a peer-to-peer Receptor network that runs jobs near the targets that need them).
This lesson is the architecture tour. It does not teach you how to click around the Controller UI button-by-button — that’s a hands-on exercise. It teaches you the mental model every engineer running AAP needs: which component does what, what an Execution Environment is, where the Mesh’s control/hop/execution nodes sit, what an Automation Hub namespace and repository and signature are, what a rulebook is and how its sources/conditions/actions decide whether to fire, and how the open-source counterparts (AWX and EDA Server) map onto the supported product. We finish with a free, end-to-end lab that brings AWX up on a kind Kubernetes cluster — the Operator-installed deployment Red Hat ships with AAP, just on the upstream image — so you can actually drive a real Controller from a laptop without a Red Hat subscription. Everything targets AAP 2.5+ / AWX 24+ / EDA Server 1.1+ (2026), with FQCN throughout.
Learning objectives
By the end of this lesson you can:
- Distinguish
ansible-core, AWX (open source), and AAP (Red Hat-supported), and explain which problems each solves. - Map the Controller object model end to end: Organisations, Teams, Users, Projects (SCM-backed), Inventories (smart/regular/constructed), Credentials, Credential Types, Execution Environments, Job Templates, Workflow Templates, Surveys, Schedules, Notifications.
- Explain the Automation Mesh node taxonomy — control, hop, execution, hybrid — and when to use each; describe the Receptor protocol and peer relationships.
- Describe Automation Hub: namespaces, repositories (
published/rh-certified/community/validated), remote sync, content signing, and howansible.cfggalaxy_server_listconsumes it. - Describe Event-Driven Ansible: the EDA Server, rulebooks (sources/conditions/actions), the canonical sources (
webhook,kafka,azure_service_bus,url_check,journald,aws_sqs), the action types (run_playbook,run_job_template,run_module,post_event,set_fact,debug), and throttling/de-duplication. - Use the
awxCLI and the Controller REST API for Infrastructure-as-Code patterns: defining Projects/Inventories/Job Templates/Schedules in version-controlled code rather than the UI. - Pick the right topology for your scale: single-node, HA control plane + worker pool, multi-cluster mesh.
- Stand up AWX on Kubernetes (
kind+ the AWX Operator) and run a Job Template against a free dynamic inventory, end to end, on your laptop.
Prerequisites & where this fits
You should already be comfortable with playbooks, roles, and collections — the collections & Execution Environments lesson is the immediate prerequisite, because AAP’s runtime is an EE. You should know dynamic inventory patterns from the dynamic-inventory lesson — Controller imports plugin-driven inventories the same way the CLI does. You should have run Molecule at least once (the Molecule lesson) so you understand the testing tier the Controller’s content lifecycle expects. Vault familiarity from the vault lesson is helpful because Controller stores secrets in its own encrypted credentials store and integrates with HashiCorp Vault, CyberArk, and Azure Key Vault. In the Ansible Zero-to-Hero programme this is the Advanced/Platform tier capstone: it bridges the CLI Ansible you’ve mastered and the platform your team actually runs in production.
Core concepts
Five mental models carry the whole lesson.
1. AAP is ansible-core + a control plane. Every Job Template, every workflow, every event-driven response eventually shells out to ansible-core running inside an Execution Environment. The platform does not replace Ansible; it manages it — schedules it, audits it, RBACs it, runs it nearer the target. If you understand ansible-playbook you already understand 80% of what Controller does at runtime.
2. The Controller is the API. Everything you do in the UI is a POST/PATCH against /api/v2/.... Every UI tab has a CLI equivalent (awx job_templates create, awx projects update). Treat the UI as a courtesy view; treat the API as the contract. The implication is huge: every Project, Inventory, Job Template, Workflow Template can — and should — be defined in version control and reconciled by code (Terraform’s awx provider, the awx.awx collection’s Ansible modules, or raw awx CLI calls). “Click-ops in the UI” is the AAP anti-pattern.
3. The Mesh moves jobs to where the targets are. The Automation Mesh is a Receptor-based peer-to-peer network. Control nodes host the API, scheduler, and database. Execution nodes actually run ansible-playbook inside an EE. Hop nodes relay traffic across network boundaries (DMZ → trusted, on-prem → cloud). A job submitted to an Instance Group “lands” on whichever execution node has capacity and network reachability to the target inventory. This decouples control (centralised API, single pane of glass) from execution (distributed, near the workload) — and is the answer to “we have ten clusters across three clouds and an air-gapped lab, how do we run Ansible against all of them?”
4. The Automation Hub is your supply chain. Where does community.general come from? In a hobby setup, galaxy.ansible.com. In a regulated setup, Automation Hub — a private registry that hosts your internal collections (the ones written by your platform team), mirrors Red Hat-certified collections from console.redhat.com, syncs validated content (approved patterns), and signs every collection with a sigstore identity. Your laptops, your Controller, and your CI all pull from Hub via galaxy_server_list in ansible.cfg. If a collection is not in Hub, it does not run in production. The Hub is the supply-chain trust boundary.
5. EDA closes the loop. Job Templates are invoked. Rulebooks are triggered. EDA is the missing eventing tier: a rulebook says “when a webhook fires from PagerDuty, AND the alert is high-severity, AND the host is in the prod_db group, run Job Template restart-postgres-replica.” The rulebook is YAML; the EDA Server runs it as a long-running process listening on its sources. EDA turns Ansible from “what an operator runs” into “what the platform runs autonomously.”
Keep these terms straight: AAP (the supported product), AWX (the upstream open-source equivalent of Controller), Controller (the orchestration component, formerly Tower), Automation Hub (the private content registry), EDA (Event-Driven Ansible — the eventing tier), Receptor (the mesh transport), Execution Environment (the OCI image carrying ansible-core + collections), Instance Group (a logical grouping of execution nodes), Project (a Controller object pointing at an SCM repo), Job Template (a Controller object that combines a project + inventory + credential into a runnable thing), Workflow Template (a DAG of Job Templates), Survey (a parameter-prompt UI on top of a JT), Rulebook (the EDA YAML defining sources/conditions/actions).
ansible-core vs AWX vs AAP
| Layer | What it is | Who runs it | What you get |
|---|---|---|---|
ansible-core |
The CLI: ansible, ansible-playbook, ansible-inventory, ansible-vault, ansible-galaxy. ~10 MB Python. |
A developer, a CI runner, a cron job. | A pure CLI. No UI, no audit, no scheduling. |
| AWX | The open-source Controller (the project Tower forked from). Same code that becomes Controller; community-supported; runs on Kubernetes via the AWX Operator. | You — self-hosted on any Kubernetes. | Web UI, REST API, RBAC, scheduling, projects, inventories, credentials, job templates, workflows, notifications. No Red Hat support, no Automation Hub product, no EDA Server (use EDA upstream separately). |
| AAP (Ansible Automation Platform) | The Red Hat-supported product. Includes Controller (Tower → Controller, the same UI as AWX with extra features and Red Hat support), Automation Hub (private registry), Event-Driven Ansible (EDA Server), Automation Mesh (Receptor), Lightspeed (AI assist, optional), and the Insights for Automation integration. Installed via the AAP Installer (RPM + Ansible plays) or the AAP Operator on OpenShift/k8s. | You — but with a Red Hat subscription, errata, support, and certified content. | Everything AWX gives you, plus Hub, EDA, certified-content sync, support, and operational hardening. |
A pragmatic rule: if you’re learning, build with AWX (free, identical model). If you’re shipping production at a regulated company, run AAP (the supply-chain and support story is non-negotiable). The core mental models — Controller, Hub, EDA, Mesh — are identical across AWX and AAP at the level you’ll use them in 99% of work.
Controller — the object model
Every Controller object is reachable in the UI and at /api/v2/<resource>/. The hierarchy:
Organisations
├── Teams (RBAC grouping of Users)
├── Users
├── Projects ← SCM-backed (git, http archive); imports playbooks/roles/collections
├── Inventories ← static, smart (filter), constructed, dynamic-plugin sources
│ ├── Hosts
│ └── Groups
├── Credentials ← machine, source-control, vault, cloud (AWS/Azure/GCP/k8s/…), generic
├── Credential Types
├── Execution Environments
├── Instance Groups (which mesh nodes run my jobs)
├── Job Templates ← project + inventory + credentials + EE + survey + schedule
├── Workflow Templates ← DAG of JTs with success/failure paths and approval nodes
└── Notifications
Organisations, Teams, Users, RBAC
Organisations are the top-level tenancy boundary — every other object belongs to one. Teams are RBAC groupings; users are added to teams, teams are granted roles on objects. The Controller has a fixed role taxonomy per object type:
| Role | What it allows |
|---|---|
admin |
Full read/write/delete on the object. |
read |
Read-only. |
execute |
Launch/run (for Job Templates), without edit rights. |
use |
Reference the object from another (e.g. use a Credential in a Job Template). |
auditor |
Read-only across the whole org including job history. |
project_admin / inventory_admin / credential_admin / notification_admin / workflow_admin |
Full rights scoped to that object class within an org. |
Sane defaults: developers get execute on the Job Templates they own and use on the Credentials those JTs need; the platform team gets admin on Projects/Inventories/Credentials; auditors get the org-level auditor role.
Projects
A Project is the Controller’s tracker for a Git repository. Configure:
| Field | Notes |
|---|---|
scm_type |
git, archive, insights. Almost always git. |
scm_url |
The repo URL (https://..., git@...). |
scm_branch |
Branch/tag/commit. |
scm_credential |
The Source Control credential used to clone. |
scm_clean |
Discard local changes on update. |
scm_delete_on_update |
Delete and re-clone on each project sync. |
scm_track_submodules |
Recurse submodules. |
scm_update_on_launch |
Sync the project before every Job Template launch (slow, fresh). |
scm_update_cache_timeout |
If scm_update_on_launch is on, skip the sync if the last one is younger than this. |
default_environment |
Which EE the JTs derived from this project use by default. |
signature_validation_credential |
If set, jobs only run when the project’s commit is signed by a key trusted by this credential. |
A Project sync clones the repo onto the control plane and indexes its playbooks, roles, and collection metadata. Job Templates reference a playbook inside a project.
Inventories
A Controller Inventory is the same concept as a CLI inventory, with extras. It can be:
- Regular — a fixed set of hosts/groups, edited in the UI or imported from a file.
- Smart — derived by a host filter across all hosts in the org (
groups__name=prod and ansible_facts__os_family=RedHat). - Constructed — equivalent to the
constructedinventory plugin: composes groups across multiple inventory sources.
Each Inventory can have one or more Inventory Sources (the dynamic part):
| Source type | What it is |
|---|---|
scm |
Inventory file in a Project (e.g. inventory/aws.aws_ec2.yml). |
ec2 / azure_rm / gcp_compute / vmware / openstack / kubernetes / satellite / terraform_state / … |
Plugin-driven, configured via the UI. |
file |
A static file uploaded directly. |
Every source has its own update schedule (update_on_launch, periodic). Set update_on_launch: true on cloud sources so a Job Template never runs against a stale fleet.
Credentials
Credentials are first-class encrypted secrets. The Controller has a fixed set of built-in Credential Types plus user-defined ones:
| Built-in type | What it stores |
|---|---|
| Machine | SSH username/password/key, become method/user/password. |
| Source Control | The credential used to clone Projects (token, key). |
| Vault | Ansible Vault password(s), with vault_id labels. |
| Network | network_cli/httpapi/netconf creds. |
| AWS / Azure / GCP / OpenStack / VMware | Cloud creds; surfaced as env vars to the EE. |
| Container Registry | Pull EEs from a private registry. |
| GPG Public Key | Validate Hub-signed collections. |
| Insights | RH Insights identity. |
| Generic | Custom — define your own type with a YAML schema. |
A user-defined Credential Type has two parts: an inputs schema (what fields the credential takes) and an injectors block (how those fields are exposed at runtime — env vars, files, extra-vars). This is how you wire HashiCorp Vault, CyberArk, or any other secret store into Controller without writing code.
Execution Environments
An EE is an OCI image carrying a specific ansible-core + collection set + Python deps. Controller runs every job inside an EE. The EE catalogue is per-org; you reference an EE by registry URL and pull credential. The default ee-supported-rhel*-aap* EE ships with AAP and contains the certified collections. Building your own EE is the collections & EE lesson topic — Controller just consumes the resulting image.
Job Templates
A Job Template (JT) is the Controller’s runnable unit. It binds:
| Field | Notes |
|---|---|
name |
Friendly name. |
job_type |
run (apply changes) or check (--check/dry-run only). |
inventory |
The Inventory to run against. |
project |
The Project the playbook lives in. |
playbook |
Path to a playbook within the project (e.g. site.yml). |
credentials |
A list — at least a Machine credential, plus any cloud/Vault credentials. |
execution_environment |
The EE image. |
instance_groups |
Which mesh node group runs this. |
forks |
Equivalent to the CLI flag. |
verbosity |
0–4. |
extra_vars |
Static or templated extra vars. |
survey |
A list of question dicts (text/multiplechoice/integer/password/textarea), prompted at launch and surfaced as extra-vars. |
schedule |
Cron-style schedule (30 2 * * *) for unattended runs. |
become_enabled |
Force become: true. |
host_config_key |
Provisioning callback URL token. |
ask_*_on_launch |
Boolean flags letting the launcher override credential/inventory/extra_vars at click time. |
webhooks |
Trigger from Git PR/push events on the project’s repo. |
notifications |
List of Notification Templates fired on start/success/failure. |
A JT with a survey is the canonical “operator self-service” pattern: the platform team writes the playbook, defines a Job Template, attaches a survey of safe parameters (which environment, which app, which version), and grants execute to a developer team. Devs launch the JT through the UI, answer the survey, and the platform team’s playbook runs against the right inventory with the right credentials, audited.
Workflow Templates
A Workflow Template (WT) is a DAG of nodes. Each node is one of:
- A Job Template launch (the common case).
- A Project sync.
- An Inventory source update.
- An approval node — pauses the workflow until a human with the
approvalrole accepts/denies. - Another Workflow Template (recursive).
Edges are typed: success, failure, always. So a WT can express “sync inventory; on success, run preflight JT; on success, prompt approval; on approval, run deploy JT; on failure of deploy, run rollback JT; on success, notify Slack.” This is the AAP equivalent of CI/CD.
Surveys, Schedules, Notifications
A Survey is a JSON-schema-shaped form attached to a JT or WT, prompted at launch. Schedules are cron-based unattended launchers (extras: rrule for richer recurrence, enabled: false to pause). Notifications are templates (Slack/MS Teams/email/webhook/PagerDuty/Mattermost/IRC) attached to JTs and WTs and triggered on started/success/failure/approved/denied/running.
The awx CLI and the API
pip install awxkit gets you the awx CLI. Login once; from then on every Controller object can be CRUDed from the command line:
awx --conf.host https://controller.example.com login
awx projects list
awx projects create --name "Ops Repo" --organization "Default" --scm_type git \
--scm_url https://git.example.com/ops/ansible.git --scm_branch main \
--scm_update_on_launch true
awx job_templates create --name "Deploy Web" --project "Ops Repo" \
--inventory "Prod AWS" --playbook deploy.yml \
--execution_environment "ee-supported-rhel9-aap25" \
--extra_vars '{"app_version":"1.2.3"}'
awx job_templates launch "Deploy Web" --monitor
Treat the CLI as the source of truth for IaC: a awx-config/ directory of YAML or shell scripts that recreates every Project/JT/WT in a clean Controller. Combined with the awx.awx Ansible collection’s modules (awx.awx.tower_project, tower_job_template, etc.), you can reconcile Controller state from a playbook — Controller automating Controller.
Automation Mesh
The Mesh is the data plane. It is a peer-to-peer network of Receptor processes (TLS-mutual-authenticated, latency-tolerant, NAT-friendly). Node roles:
| Role | What it does |
|---|---|
| Control | Hosts the Controller API, the scheduler, the database, the web UI. Does not execute jobs. |
| Hybrid | Control + Execution on one machine. The single-node default. |
| Execution | Pulls assigned jobs from the queue, runs ansible-playbook inside an EE, returns results. Does not host the API. |
| Hop | Relays traffic between mesh segments. Has no execution capacity. Used to bridge network boundaries (DMZ ↔ trusted, on-prem ↔ cloud). |
Instance Groups are logical groupings of execution nodes. A Job Template is assigned to an Instance Group; the scheduler picks an idle node in that group with network reachability to the inventory. Common patterns:
default— the catch-all group containing every execution node.aws-runners— execution nodes deployed in AWS, tagged so AWS-only JTs land here (cheaper egress, lower latency).dmz-bridge— hop nodes that bridge a corporate-DMZ JT into a customer’s VPC.gpu-heavy— execution nodes with GPUs, for ML provisioning workflows.
Peer relationships form the topology graph: each node’s config lists which other nodes it dials and which it accepts dials from. The mesh tolerates link failures — if a hop is down, traffic re-routes through alternative peers.
A typical small AAP install has one hybrid control node + two execution nodes + a hop node at the customer-VPC bridge — a topology that handles thousands of concurrent jobs and survives single-node loss. The clustered AAP install scales out by adding more control nodes (HA with a shared Postgres) and more execution nodes (just more workers).
Automation Hub
Hub is the private registry. It is a Pulp-based service that hosts collections in repositories:
| Repository | What’s in it |
|---|---|
published |
Your internal collections — the ones your team writes and publishes via ansible-galaxy collection publish. |
rh-certified |
Red Hat-certified collections (synced from console.redhat.com). |
community |
Curated community collections (synced from galaxy.ansible.com, optional). |
validated |
Red Hat validated content — opinionated, tested patterns for specific use cases (synced from RH). |
A Namespace is the Hub equivalent of a GitHub organisation: my_namespace.my_collection. A namespace owner controls who can publish.
Content Signing: Hub signs every published collection with a configured signing key (sigstore-compatible). When ansible-galaxy collection install pulls from Hub, it verifies the signature against a public key stored locally or in a Controller GPG Public Key credential. Unsigned collections are rejected if signature_required is set. This is the supply-chain integrity story: a malicious collection cannot land on production unless it carries a signature from a key your CI trusts.
Remote sync lets Hub mirror upstream sources on a schedule. Configure a Remote (the upstream URL + auth + auth header) and a Repository (which content to pull). Schedule it. Air-gapped installs use this with an HTTP proxy that holds the Red Hat manifest token, fetching once into a staging Hub and never letting console.redhat.com reach production directly.
Consuming Hub from a CLI client (laptop, CI, Controller EE build) is just an ansible.cfg galaxy_server_list:
[galaxy]
server_list = my_hub, rh_certified, community
[galaxy_server.my_hub]
url = https://hub.example.com/api/galaxy/content/published/
token = ${HUB_TOKEN}
[galaxy_server.rh_certified]
url = https://hub.example.com/api/galaxy/content/rh-certified/
token = ${HUB_TOKEN}
[galaxy_server.community]
url = https://hub.example.com/api/galaxy/content/community/
token = ${HUB_TOKEN}
The order matters: ansible-galaxy collection install walks the list in order. Put your published collections first so an internal collection that shadows an upstream name (rare but legitimate) wins.
Event-Driven Ansible (EDA)
EDA is a separate component (the EDA Server) that runs rulebooks: long-running YAML processes that listen on sources, evaluate conditions, and fire actions. Architecturally it is pub/sub for Ansible.
A rulebook, dissected
# rulebooks/restart-on-pagerduty.yml
- name: Restart Postgres replica on PagerDuty alert
hosts: all
sources:
- ansible.eda.webhook:
host: 0.0.0.0
port: 5000
token: "{{ EDA_PAGERDUTY_TOKEN }}"
rules:
- name: High-severity alert on a prod_db host
condition: >-
event.payload.event.event_type == 'incident.triggered' and
event.payload.event.data.incident.urgency == 'high' and
event.payload.event.data.incident.tags is contains 'prod_db'
throttle:
once_within: 10 minutes
group_by_attributes:
- event.payload.event.data.incident.id
action:
run_job_template:
name: "Restart Postgres Replica"
organization: "Default"
job_args:
extra_vars:
incident_id: "{{ event.payload.event.data.incident.id }}"
host: "{{ event.payload.event.data.incident.tags | select('match', '^host:') | first | replace('host:', '') }}"
Every rulebook has three sections:
Sources — the eventing inputs. Each source is a long-running coroutine in the EDA Server. Common sources from ansible.eda:
| Source | What it listens on |
|---|---|
ansible.eda.webhook |
A configurable HTTP endpoint; supports token for shared-secret auth. |
ansible.eda.kafka |
Kafka topic; configurable group, offset, encryption. |
ansible.eda.azure_service_bus |
Azure Service Bus queue/topic. |
ansible.eda.aws_sqs |
AWS SQS queue. |
ansible.eda.url_check |
Periodically GETs a URL; emits an event on status change. |
ansible.eda.alertmanager |
Prometheus Alertmanager webhook. |
ansible.eda.journald |
systemd journal entries matching a filter (local source). |
ansible.eda.range |
A counter — useful for testing. |
ansible.eda.generic |
Pre-canned events from a fixture file — for development and test. |
Conditions — Jinja-shaped predicates over the event payload. Conditions can chain (and/or), reference nested fields (event.payload.foo.bar), use built-in operators (is contains, is regex, is search, is defined), and reference rule-local state (vars).
Actions — what to do when a condition fires. The action set:
| Action | What it does |
|---|---|
run_playbook |
Execute a playbook locally inside the EDA Server’s worker. |
run_job_template |
Launch a Controller Job Template (the production-grade action — the JT runs in Controller’s mesh, audited). |
run_workflow_template |
Launch a Controller Workflow Template. |
run_module |
Run a single Ansible module. |
post_event |
Forward the event to another rulebook (chaining). |
set_fact |
Stash a value in rulebook-local memory. |
retract_fact |
Remove a fact. |
print_event |
Log to the EDA Server (debugging). |
debug |
Like print_event but with explicit message templating. |
none |
Explicit no-op. |
shutdown |
Stop the rulebook (rare; usually for graceful restart). |
Throttling & de-duplication — EDA has built-in throttle: blocks (once_within, once_after, group_by_attributes) that prevent a noisy source from firing the same action a hundred times. Pair with group_by_attributes to scope the throttle key to the meaningful identity (an incident ID, a host name) rather than the whole rule. Without throttling, a flapping monitor will DoS your Controller.
EDA Server runs rulebooks as long-running processes managed via a rulebook activations API on the Controller. Rulebooks live in Decision Environments (DE) — an EE-shaped image that ships ansible-rulebook + the ansible.eda collection + your custom event sources. The DE is to EDA what an EE is to Controller.
Topology cheat-sheet
| Scale | Topology |
|---|---|
| Single-node lab/dev | One hybrid node (control + execution) on an 8-vCPU/16-GB VM. Postgres co-located. AWX-on-kind works for laptop labs. |
| Small prod (< 50 concurrent jobs) | One control node, two execution nodes, an external HA Postgres. One Hub, one EDA Server. |
| Medium prod (< 500 concurrent jobs) | Three control nodes (HA cluster) behind a load balancer, four+ execution nodes per region/cloud, hop nodes at every network boundary, external HA Postgres + Redis. Two Hubs (active/standby across DCs). |
| Air-gapped enterprise | All of the above plus a staging Hub in the connected DC that mirrors console.redhat.com, a one-way sync to the production Hub in the air-gapped DC. EEs and DEs built in the staging side, signed, then mirrored. |
The Mesh’s flexibility is what makes any of these workable: control plane stays small (one HA cluster), execution scales linearly by adding nodes, hop nodes solve every network-segmentation problem.
Hands-on lab: AWX on a kind cluster
You can drive a real Controller end-to-end on a laptop with AWX and kind. No Red Hat subscription, no cloud bill.
1. Bring up kind.
brew install kind kubectl # or use the project's apt/yum repos on Linux
kind create cluster --name awx
kubectl config use-context kind-awx
2. Install the AWX Operator.
kubectl apply -k "https://github.com/ansible/awx-operator/config/default?ref=2.x"
kubectl create namespace awx
3. Deploy AWX itself.
# awx.yaml
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
name: awx
namespace: awx
spec:
service_type: nodeport
nodeport_port: 30080
kubectl apply -f awx.yaml
kubectl -n awx get pods --watch # wait until awx-... is Running (5–10 minutes)
4. Get the admin password.
kubectl -n awx get secret awx-admin-password -o jsonpath="{.data.password}" | base64 -d; echo
5. Open the UI.
kubectl -n awx port-forward svc/awx-service 8080:80
# now visit http://localhost:8080 and log in as admin / <the password>
6. Drive it from the CLI.
pip install awxkit
awx --conf.host http://localhost:8080 --conf.username admin \
--conf.password "$(kubectl -n awx get secret awx-admin-password -o jsonpath='{.data.password}' | base64 -d)" \
login
awx projects list
awx organizations list
7. Create a Project + Inventory + JT entirely from CLI.
# A public, well-known role repo to keep the lab free
awx projects create --name "demo" --organization Default --scm_type git \
--scm_url https://github.com/ansible/ansible-tower-samples.git --scm_branch master \
--wait
awx inventories create --name "localhost" --organization Default
awx hosts create --name "localhost" --inventory "localhost" \
--variables '{"ansible_connection": "local"}'
awx job_templates create --name "demo-hello" --project "demo" \
--inventory "localhost" --playbook hello_world.yml \
--execution_environment "AWX EE (latest)" \
--extra_vars '{"who": "Wave-2 student"}'
awx job_templates launch "demo-hello" --monitor
The --monitor flag streams the live job output to your terminal — same logs as the UI’s job pane. You just exercised the Controller object model end-to-end from a Linux/Mac shell.
8. Add a webhook-driven rulebook (EDA upstream, optional).
pip install ansible-rulebook ansible-runner
ansible-galaxy collection install ansible.eda
# rulebook.yml
- name: Trigger AWX JT on local webhook
hosts: all
sources:
- ansible.eda.webhook:
host: 0.0.0.0
port: 5050
rules:
- name: any POST to /endpoint
condition: event.endpoint == "endpoint"
action:
run_job_template:
name: "demo-hello"
organization: "Default"
EDA_CONTROLLER_URL=http://localhost:8080 \
EDA_CONTROLLER_TOKEN=<personal-token-from-AWX-UI> \
ansible-rulebook --rulebook rulebook.yml -i ansible/inventory.yml --verbose
# in another shell:
curl -X POST http://localhost:5050/endpoint -d '{"hello":"world"}' -H content-type:application/json
The rulebook fires, the action runs, AWX picks up the JT launch — the same pattern AAP’s EDA Server runs at scale, just managed manually here.
9. Cleanup.
kind delete cluster --name awx
You now have hands-on familiarity with every Controller object, the awx CLI, and an EDA rulebook firing into Controller — without spending a cent.
Common mistakes & troubleshooting
JT runs against the wrong inventory — you set ask_inventory_on_launch: true but the launcher accepted the default. Decide whether the JT’s inventory is fixed (most cases) or selectable (rare); fixed JTs are safer.
Project sync hangs — wrong scm_credential (no read access to the repo) or scm_branch doesn’t exist. Run awx projects update <name> --monitor and watch the trace.
Survey answers don’t reach the play — the survey variable name doesn’t match the play’s extra_vars reference (they must be exact). Surveys produce extra-vars, which are highest-precedence Ansible vars; review the variables-precedence lesson.
Mesh node “stuck pending” — the Receptor handshake failed (TLS cert mismatch, peer not in the allow-list). awx-manage list_instances and the Receptor logs on the node tell you why.
Hub install signature failure — your Controller’s signature_validation_credential references a GPG key that doesn’t match Hub’s signing key. Either re-publish the collection, or update the credential. Never disable signature validation in production.
EDA rulebook fires repeatedly — your condition: matches every event, not just the relevant one, and there is no throttle:. Tighten the condition; add a throttle: { once_within: <period>, group_by_attributes: [...] }.
“No execution capacity” — the JT’s Instance Group has no idle execution nodes. Either add nodes, or assign the JT to a different group with capacity. The awx instance_groups list command shows utilisation.
Click-ops drift — someone edited a JT in the UI; CI’s IaC reconciliation now overwrites it. Either lock the UI editing rights to platform admins only, or make the IaC reconciliation idempotent and accept the drift as a routine “platform owns this object” check.
Best practices
- Treat the Controller as code. Every Project/Inventory/JT/WT/Survey/Schedule/Notification lives in version-controlled YAML reconciled by the
awx.awxcollection or Terraform’sawxprovider. The UI is a courtesy. - Layer credentials: Machine credential for SSH; Cloud credential for the inventory plugin; Vault credential for any vaulted vars in the play. Combine on a JT, do not stuff secrets into
extra_vars. - Tag JTs by intent: a
purpose: provision/purpose: rolling-deploy/purpose: rotate-secretstaxonomy in JT names makes the UI navigable at scale. - Survey for safety: every operator-facing JT has a survey constraining choices to the safe ones (drop-down environments, integer with min/max, regex-validated app names).
- Use Workflow Templates with approval nodes for any JT that touches prod — humans gate the deploy, not the JT.
- Run Hub even at small scale. The
publishedrepo gives your team a single source of internal collections, signed, version-pinned. - Sign your collections and require signature validation in Controller. Make supply-chain integrity a default, not an option.
- Use Instance Groups to keep cloud jobs on cloud-side execution nodes — saves egress, cuts latency.
- EDA throttling is mandatory. A noisy source without
throttle:will DoS Controller. - Air-gap with a staging Hub. Never let production-tier hosts reach
console.redhat.comorgalaxy.ansible.comdirectly. - Schedule project syncs explicitly rather than
update_on_launch: truefor hot-path JTs —update_on_launchadds wall-clock delay to every launch.
Security notes
AAP is the automation control plane. It holds privileged credentials for every system it touches; protect it like a privileged-access workstation.
- TLS everywhere: API, UI, mesh, Hub, EDA. Use cert-manager (or your enterprise PKI). No HTTP outside
localhost. - SSO + MFA: integrate with SAML/OIDC; require MFA for any human user.
- RBAC scoped tightly: developers get
executeon the JTs they own,useon the credentials those JTs need. Nobody getsadminon the org by default. - Credential injection only: never echo a credential into
extra_vars. Use a Custom Credential Type withinjectors:so secrets reach the EE as env vars or files, not as extra-vars in the audit log. - Audit log retention: Controller writes a row per job + per task to its database; archive these per your compliance regime.
- Signed collections, signed playbooks: enable Hub content signing and Controller’s
signature_validation_credentialon every Project. A tamper ofmaindoes not run in production. - EDA token rotation: rulebook activations carry a Controller token. Use short-lived personal tokens, never service-account static tokens in YAML.
- Mesh peer ACLs: only intended nodes peer; a rogue Receptor cannot dial in.
- Backups: the Controller database holds RBAC, schedules, and encrypted credentials. Back it up encrypted; rehearse restore quarterly. Hub’s Pulp content is bulky but reproducible from upstream syncs; back up the metadata, not necessarily the content.
Interview & exam questions
- What’s the difference between AAP, AWX, and Controller? AAP = the supported product; AWX = the upstream open-source project that becomes Controller; Controller = the AAP component that orchestrates jobs (formerly called Tower).
- What does an Execution Environment do? It’s the OCI image containing
ansible-core+ collections + Python deps that Controller runs every job inside; the contract between Controller and Ansible. - Control vs hop vs execution nodes? Control = API + scheduler + DB; execution = runs jobs in EEs; hop = relays mesh traffic across network boundaries with no execution capacity.
- What is an Instance Group? A logical grouping of execution nodes; JTs target an Instance Group and the scheduler picks an idle node in it.
- What’s a Project, in Controller terms? A tracked SCM repo (usually git) from which JTs draw playbooks and configs.
- What does a Survey do? Prompts the JT launcher for parameters via a JSON-schema-shaped form; answers become extra-vars at runtime.
- What’s an Automation Hub Repository? A logical collection of collections —
published(yours),rh-certified,community,validated. Each can be served, signed, and synced separately. - How does content signing work? Hub signs every published collection with a sigstore-compatible key; clients verify against a public key (Controller’s
GPG Public Keycredential or~/.ansible/galaxy_keys); unsigned content is rejected. - What’s a rulebook, in EDA? A YAML file with sources (event inputs), conditions (predicates), and actions (run JTs/playbooks/modules); EDA Server runs it as a long-running process.
- Difference between
run_playbookandrun_job_template?run_playbookruns the playbook inside EDA Server’s worker.run_job_templatecalls Controller’s API to launch a JT — the production-grade path because the JT runs in the mesh, audited, with all the RBAC and credential handling. - Why is throttling necessary in EDA? Without it, a noisy source (a flapping monitor) fires the same action repeatedly and overwhelms Controller.
throttle: { once_within: <period>, group_by_attributes: [...] }scopes the dedup to the meaningful identity. - What’s the
awxCLI good for? Treating the Controller as code: every object can be CRUDed from a shell script or CI job; the UI is convenience, the API is the contract. - How do you scale Controller for HA? Three control nodes behind a load balancer, an external HA Postgres, multiple execution nodes per region/cloud, hop nodes at every network boundary.
- What’s a Decision Environment? The EDA equivalent of an Execution Environment — the container image carrying
ansible-rulebook+ansible.eda+ your event sources.
Quick check
- AAP = Controller + Automation Hub + EDA + Mesh (+ Lightspeed). AWX = upstream Controller, no Hub product, no EDA Server.
- Controller objects: Org → Teams/Users → Projects/Inventories/Credentials/EEs → Job Templates / Workflow Templates → Schedules / Notifications / Surveys.
- Mesh nodes: control / hybrid / execution / hop. Instance Groups select where a JT runs.
- Hub repositories:
published(yours),rh-certified,community,validated. Always sign; always verify. awxCLI = the API made shell-friendly. Treat Controller objects as IaC.- EDA = sources → conditions → actions, with throttling.
run_job_templateis the production action. - Topology scales by adding execution + hop nodes; control plane stays small (HA pair/triple).
Exercise
Stand up the AWX-on-kind lab and:
- Define one Project, one Inventory, one Job Template entirely from the
awxCLI (no UI clicks). - Add a Survey to the JT prompting for
target_env(drop-down:dev/staging) andapp_version(regex^\d+\.\d+\.\d+$). - Build a Workflow Template that runs a preflight JT, an approval node, then the deploy JT, with a Slack notification on success/failure.
- Mirror a community collection (e.g.
community.general) into your local Hub if you’ve stood one up; otherwise document the steps. - Write a rulebook that listens on a webhook and fires
run_job_templateagainst your demo JT; throttle to once per minute per source IP. - Verify that an unsigned collection cannot install when
signature_requiredis set; sign it; verify it does install. - Document in your repo’s
README.mdthe exact commands a new engineer types to recreate the AWX state from scratch (the IaC story).
Certification mapping
- EX374 (Developing Automation with Ansible Automation Platform) — the AAP exam. Every section of this lesson maps directly: Controller objects, Mesh topology, Automation Hub, EDA rulebooks. Expect to write a rulebook from scratch, configure a JT with a Survey, and use Hub-served signed collections.
- EX467 (Managing AAP) — the operations-focused exam. Heavier on Mesh, HA, backup/restore, Hub remote sync, EDA Server install. This lesson gives you the model; pair with hands-on AAP installer experience for EX467.
- EX358 (Cloud Automation with Ansible) — references AAP as the orchestrator for cloud roles; the Controller half of this lesson is its prerequisite.
- Beyond exams, every senior Ansible job description in 2026 lists “experience with AAP / AWX, including EDA” as a discriminator. This is the lesson that lets you say yes.
Glossary
- AAP — Ansible Automation Platform; Red Hat’s supported product.
- AWX — the open-source upstream of Controller.
- Controller — the orchestration component (API, UI, scheduler); formerly Tower.
- Automation Hub — the private collection registry.
- EDA / Event-Driven Ansible — the eventing tier; runs rulebooks in the EDA Server.
- Automation Mesh — the Receptor-based peer-to-peer execution network.
- Receptor — the mesh transport protocol/binary.
- Control node — hosts API/scheduler/DB; no execution.
- Execution node — runs jobs inside EEs; no API.
- Hop node — relays mesh traffic; no execution.
- Hybrid node — control + execution on one machine.
- Instance Group — logical grouping of execution nodes; JTs target a group.
- Execution Environment (EE) — OCI image with
ansible-core+ collections. - Decision Environment (DE) — OCI image with
ansible-rulebook+ansible.edafor EDA. - Project — Controller object pointing at an SCM repo.
- Inventory — Controller object holding hosts/groups; can be regular/smart/constructed; sources can be plugin-driven.
- Credential — encrypted Controller secret; types include Machine/Source-Control/Vault/Cloud/Network/Custom.
- Credential Type — the schema for a Credential; user-defined types let you wire arbitrary secret stores.
- Job Template (JT) — combines Project + Inventory + Credentials + EE + Survey + Schedule into a runnable thing.
- Workflow Template (WT) — DAG of JTs/syncs/approvals/sub-workflows with success/failure edges.
- Survey — JSON-schema-like prompt attached to a JT/WT, surfaced as extra-vars at launch.
- Schedule — cron/rrule recurrence on a JT/WT.
- Notification — Slack/Teams/email/webhook/PagerDuty fired on job state changes.
- Hub Repository —
published/rh-certified/community/validated. - Hub Namespace — the publisher container (
namespace.collection). - Content Signing — sigstore-compatible signature attached to every Hub collection.
- Remote Sync — Hub mirror of an upstream Galaxy/Hub.
- Rulebook — YAML document with sources, conditions, actions; the unit of EDA.
- Source / Condition / Action — the rulebook’s three sections.
- Throttle — EDA dedup rule that prevents an action from firing too often.
awxCLI —awxkit-shipped client that CRUDs every Controller object via the API.awx.awxcollection — Ansible modules for managing Controller objects from a play.
Next steps
You now have the architecture model of AAP: how Controller, Hub, EDA, and the Mesh fit together; what every Controller object is for; how rulebooks turn events into runs; and how the platform’s supply-chain story (signed collections in Hub, Decision Environments for EDA, Execution Environments for Controller) anchors trust in production. The platform is now a substrate you can reason about — but the substrate only matters when fed by quality content. The natural sequels close that loop: ship tested, packaged content to Hub by combining the Molecule lesson (proof of correctness) with the collections & EE lesson (the build-and-publish path); feed the platform with live fleets by using the patterns from Dynamic Inventory & Secure Secrets inside Controller’s Inventory Sources; and operate at scale with Tuning Ansible for Speed & Scale and Delegation, Strategies & Rolling Updates — the levers that compose into Workflow Templates that complete in minutes rather than hours. AAP is where every other Advanced lesson cashes in.