Ansible Lesson 22 of 42

Ansible Automation Platform Architecture, In Depth: Controller, Automation Hub & Event-Driven Ansible

ansible-playbook from a developer’s laptop is automation. It is also, at scale, a problem: who ran it, against which inventory, with which credentials, when? Where do collections come from — and how do you trust them? When a webhook fires at 3 a.m., who runs the runbook? Ansible Automation Platform (AAP) is Red Hat’s answer: the supported, hardened, multi-component product that turns Ansible from “a CLI you trust your operators with” into “a platform your auditors trust.” AAP is not a different language — your roles, collections, and Execution Environments are unchanged — but it surrounds them with Controller (formerly Tower / upstream AWX: web UI + REST API + job execution at scale), Automation Hub (your private collection registry with content signing), and Event-Driven Ansible (EDA) (webhook/kafka/queue-driven autonomous response), all glued together by the Automation Mesh (a peer-to-peer Receptor network that runs jobs near the targets that need them).

This lesson is the architecture tour. It does not teach you how to click around the Controller UI button-by-button — that’s a hands-on exercise. It teaches you the mental model every engineer running AAP needs: which component does what, what an Execution Environment is, where the Mesh’s control/hop/execution nodes sit, what an Automation Hub namespace and repository and signature are, what a rulebook is and how its sources/conditions/actions decide whether to fire, and how the open-source counterparts (AWX and EDA Server) map onto the supported product. We finish with a free, end-to-end lab that brings AWX up on a kind Kubernetes cluster — the Operator-installed deployment Red Hat ships with AAP, just on the upstream image — so you can actually drive a real Controller from a laptop without a Red Hat subscription. Everything targets AAP 2.5+ / AWX 24+ / EDA Server 1.1+ (2026), with FQCN throughout.

Learning objectives

By the end of this lesson you can:

Prerequisites & where this fits

You should already be comfortable with playbooks, roles, and collections — the collections & Execution Environments lesson is the immediate prerequisite, because AAP’s runtime is an EE. You should know dynamic inventory patterns from the dynamic-inventory lesson — Controller imports plugin-driven inventories the same way the CLI does. You should have run Molecule at least once (the Molecule lesson) so you understand the testing tier the Controller’s content lifecycle expects. Vault familiarity from the vault lesson is helpful because Controller stores secrets in its own encrypted credentials store and integrates with HashiCorp Vault, CyberArk, and Azure Key Vault. In the Ansible Zero-to-Hero programme this is the Advanced/Platform tier capstone: it bridges the CLI Ansible you’ve mastered and the platform your team actually runs in production.

Core concepts

Five mental models carry the whole lesson.

1. AAP is ansible-core + a control plane. Every Job Template, every workflow, every event-driven response eventually shells out to ansible-core running inside an Execution Environment. The platform does not replace Ansible; it manages it — schedules it, audits it, RBACs it, runs it nearer the target. If you understand ansible-playbook you already understand 80% of what Controller does at runtime.

2. The Controller is the API. Everything you do in the UI is a POST/PATCH against /api/v2/.... Every UI tab has a CLI equivalent (awx job_templates create, awx projects update). Treat the UI as a courtesy view; treat the API as the contract. The implication is huge: every Project, Inventory, Job Template, Workflow Template can — and should — be defined in version control and reconciled by code (Terraform’s awx provider, the awx.awx collection’s Ansible modules, or raw awx CLI calls). “Click-ops in the UI” is the AAP anti-pattern.

3. The Mesh moves jobs to where the targets are. The Automation Mesh is a Receptor-based peer-to-peer network. Control nodes host the API, scheduler, and database. Execution nodes actually run ansible-playbook inside an EE. Hop nodes relay traffic across network boundaries (DMZ → trusted, on-prem → cloud). A job submitted to an Instance Group “lands” on whichever execution node has capacity and network reachability to the target inventory. This decouples control (centralised API, single pane of glass) from execution (distributed, near the workload) — and is the answer to “we have ten clusters across three clouds and an air-gapped lab, how do we run Ansible against all of them?”

4. The Automation Hub is your supply chain. Where does community.general come from? In a hobby setup, galaxy.ansible.com. In a regulated setup, Automation Hub — a private registry that hosts your internal collections (the ones written by your platform team), mirrors Red Hat-certified collections from console.redhat.com, syncs validated content (approved patterns), and signs every collection with a sigstore identity. Your laptops, your Controller, and your CI all pull from Hub via galaxy_server_list in ansible.cfg. If a collection is not in Hub, it does not run in production. The Hub is the supply-chain trust boundary.

5. EDA closes the loop. Job Templates are invoked. Rulebooks are triggered. EDA is the missing eventing tier: a rulebook says “when a webhook fires from PagerDuty, AND the alert is high-severity, AND the host is in the prod_db group, run Job Template restart-postgres-replica.” The rulebook is YAML; the EDA Server runs it as a long-running process listening on its sources. EDA turns Ansible from “what an operator runs” into “what the platform runs autonomously.”

Keep these terms straight: AAP (the supported product), AWX (the upstream open-source equivalent of Controller), Controller (the orchestration component, formerly Tower), Automation Hub (the private content registry), EDA (Event-Driven Ansible — the eventing tier), Receptor (the mesh transport), Execution Environment (the OCI image carrying ansible-core + collections), Instance Group (a logical grouping of execution nodes), Project (a Controller object pointing at an SCM repo), Job Template (a Controller object that combines a project + inventory + credential into a runnable thing), Workflow Template (a DAG of Job Templates), Survey (a parameter-prompt UI on top of a JT), Rulebook (the EDA YAML defining sources/conditions/actions).

ansible-core vs AWX vs AAP

Layer What it is Who runs it What you get
ansible-core The CLI: ansible, ansible-playbook, ansible-inventory, ansible-vault, ansible-galaxy. ~10 MB Python. A developer, a CI runner, a cron job. A pure CLI. No UI, no audit, no scheduling.
AWX The open-source Controller (the project Tower forked from). Same code that becomes Controller; community-supported; runs on Kubernetes via the AWX Operator. You — self-hosted on any Kubernetes. Web UI, REST API, RBAC, scheduling, projects, inventories, credentials, job templates, workflows, notifications. No Red Hat support, no Automation Hub product, no EDA Server (use EDA upstream separately).
AAP (Ansible Automation Platform) The Red Hat-supported product. Includes Controller (Tower → Controller, the same UI as AWX with extra features and Red Hat support), Automation Hub (private registry), Event-Driven Ansible (EDA Server), Automation Mesh (Receptor), Lightspeed (AI assist, optional), and the Insights for Automation integration. Installed via the AAP Installer (RPM + Ansible plays) or the AAP Operator on OpenShift/k8s. You — but with a Red Hat subscription, errata, support, and certified content. Everything AWX gives you, plus Hub, EDA, certified-content sync, support, and operational hardening.

A pragmatic rule: if you’re learning, build with AWX (free, identical model). If you’re shipping production at a regulated company, run AAP (the supply-chain and support story is non-negotiable). The core mental models — Controller, Hub, EDA, Mesh — are identical across AWX and AAP at the level you’ll use them in 99% of work.

Controller — the object model

Every Controller object is reachable in the UI and at /api/v2/<resource>/. The hierarchy:

Organisations
├── Teams (RBAC grouping of Users)
├── Users
├── Projects        ← SCM-backed (git, http archive); imports playbooks/roles/collections
├── Inventories     ← static, smart (filter), constructed, dynamic-plugin sources
│   ├── Hosts
│   └── Groups
├── Credentials     ← machine, source-control, vault, cloud (AWS/Azure/GCP/k8s/…), generic
├── Credential Types
├── Execution Environments
├── Instance Groups (which mesh nodes run my jobs)
├── Job Templates           ← project + inventory + credentials + EE + survey + schedule
├── Workflow Templates      ← DAG of JTs with success/failure paths and approval nodes
└── Notifications

Organisations, Teams, Users, RBAC

Organisations are the top-level tenancy boundary — every other object belongs to one. Teams are RBAC groupings; users are added to teams, teams are granted roles on objects. The Controller has a fixed role taxonomy per object type:

Role What it allows
admin Full read/write/delete on the object.
read Read-only.
execute Launch/run (for Job Templates), without edit rights.
use Reference the object from another (e.g. use a Credential in a Job Template).
auditor Read-only across the whole org including job history.
project_admin / inventory_admin / credential_admin / notification_admin / workflow_admin Full rights scoped to that object class within an org.

Sane defaults: developers get execute on the Job Templates they own and use on the Credentials those JTs need; the platform team gets admin on Projects/Inventories/Credentials; auditors get the org-level auditor role.

Projects

A Project is the Controller’s tracker for a Git repository. Configure:

Field Notes
scm_type git, archive, insights. Almost always git.
scm_url The repo URL (https://..., git@...).
scm_branch Branch/tag/commit.
scm_credential The Source Control credential used to clone.
scm_clean Discard local changes on update.
scm_delete_on_update Delete and re-clone on each project sync.
scm_track_submodules Recurse submodules.
scm_update_on_launch Sync the project before every Job Template launch (slow, fresh).
scm_update_cache_timeout If scm_update_on_launch is on, skip the sync if the last one is younger than this.
default_environment Which EE the JTs derived from this project use by default.
signature_validation_credential If set, jobs only run when the project’s commit is signed by a key trusted by this credential.

A Project sync clones the repo onto the control plane and indexes its playbooks, roles, and collection metadata. Job Templates reference a playbook inside a project.

Inventories

A Controller Inventory is the same concept as a CLI inventory, with extras. It can be:

Each Inventory can have one or more Inventory Sources (the dynamic part):

Source type What it is
scm Inventory file in a Project (e.g. inventory/aws.aws_ec2.yml).
ec2 / azure_rm / gcp_compute / vmware / openstack / kubernetes / satellite / terraform_state / … Plugin-driven, configured via the UI.
file A static file uploaded directly.

Every source has its own update schedule (update_on_launch, periodic). Set update_on_launch: true on cloud sources so a Job Template never runs against a stale fleet.

Credentials

Credentials are first-class encrypted secrets. The Controller has a fixed set of built-in Credential Types plus user-defined ones:

Built-in type What it stores
Machine SSH username/password/key, become method/user/password.
Source Control The credential used to clone Projects (token, key).
Vault Ansible Vault password(s), with vault_id labels.
Network network_cli/httpapi/netconf creds.
AWS / Azure / GCP / OpenStack / VMware Cloud creds; surfaced as env vars to the EE.
Container Registry Pull EEs from a private registry.
GPG Public Key Validate Hub-signed collections.
Insights RH Insights identity.
Generic Custom — define your own type with a YAML schema.

A user-defined Credential Type has two parts: an inputs schema (what fields the credential takes) and an injectors block (how those fields are exposed at runtime — env vars, files, extra-vars). This is how you wire HashiCorp Vault, CyberArk, or any other secret store into Controller without writing code.

Execution Environments

An EE is an OCI image carrying a specific ansible-core + collection set + Python deps. Controller runs every job inside an EE. The EE catalogue is per-org; you reference an EE by registry URL and pull credential. The default ee-supported-rhel*-aap* EE ships with AAP and contains the certified collections. Building your own EE is the collections & EE lesson topic — Controller just consumes the resulting image.

Job Templates

A Job Template (JT) is the Controller’s runnable unit. It binds:

Field Notes
name Friendly name.
job_type run (apply changes) or check (--check/dry-run only).
inventory The Inventory to run against.
project The Project the playbook lives in.
playbook Path to a playbook within the project (e.g. site.yml).
credentials A list — at least a Machine credential, plus any cloud/Vault credentials.
execution_environment The EE image.
instance_groups Which mesh node group runs this.
forks Equivalent to the CLI flag.
verbosity 0–4.
extra_vars Static or templated extra vars.
survey A list of question dicts (text/multiplechoice/integer/password/textarea), prompted at launch and surfaced as extra-vars.
schedule Cron-style schedule (30 2 * * *) for unattended runs.
become_enabled Force become: true.
host_config_key Provisioning callback URL token.
ask_*_on_launch Boolean flags letting the launcher override credential/inventory/extra_vars at click time.
webhooks Trigger from Git PR/push events on the project’s repo.
notifications List of Notification Templates fired on start/success/failure.

A JT with a survey is the canonical “operator self-service” pattern: the platform team writes the playbook, defines a Job Template, attaches a survey of safe parameters (which environment, which app, which version), and grants execute to a developer team. Devs launch the JT through the UI, answer the survey, and the platform team’s playbook runs against the right inventory with the right credentials, audited.

Workflow Templates

A Workflow Template (WT) is a DAG of nodes. Each node is one of:

Edges are typed: success, failure, always. So a WT can express “sync inventory; on success, run preflight JT; on success, prompt approval; on approval, run deploy JT; on failure of deploy, run rollback JT; on success, notify Slack.” This is the AAP equivalent of CI/CD.

Surveys, Schedules, Notifications

A Survey is a JSON-schema-shaped form attached to a JT or WT, prompted at launch. Schedules are cron-based unattended launchers (extras: rrule for richer recurrence, enabled: false to pause). Notifications are templates (Slack/MS Teams/email/webhook/PagerDuty/Mattermost/IRC) attached to JTs and WTs and triggered on started/success/failure/approved/denied/running.

The awx CLI and the API

pip install awxkit gets you the awx CLI. Login once; from then on every Controller object can be CRUDed from the command line:

awx --conf.host https://controller.example.com login
awx projects list
awx projects create --name "Ops Repo" --organization "Default" --scm_type git \
  --scm_url https://git.example.com/ops/ansible.git --scm_branch main \
  --scm_update_on_launch true
awx job_templates create --name "Deploy Web" --project "Ops Repo" \
  --inventory "Prod AWS" --playbook deploy.yml \
  --execution_environment "ee-supported-rhel9-aap25" \
  --extra_vars '{"app_version":"1.2.3"}'
awx job_templates launch "Deploy Web" --monitor

Treat the CLI as the source of truth for IaC: a awx-config/ directory of YAML or shell scripts that recreates every Project/JT/WT in a clean Controller. Combined with the awx.awx Ansible collection’s modules (awx.awx.tower_project, tower_job_template, etc.), you can reconcile Controller state from a playbook — Controller automating Controller.

Automation Mesh

The Mesh is the data plane. It is a peer-to-peer network of Receptor processes (TLS-mutual-authenticated, latency-tolerant, NAT-friendly). Node roles:

Role What it does
Control Hosts the Controller API, the scheduler, the database, the web UI. Does not execute jobs.
Hybrid Control + Execution on one machine. The single-node default.
Execution Pulls assigned jobs from the queue, runs ansible-playbook inside an EE, returns results. Does not host the API.
Hop Relays traffic between mesh segments. Has no execution capacity. Used to bridge network boundaries (DMZ ↔ trusted, on-prem ↔ cloud).

Instance Groups are logical groupings of execution nodes. A Job Template is assigned to an Instance Group; the scheduler picks an idle node in that group with network reachability to the inventory. Common patterns:

Peer relationships form the topology graph: each node’s config lists which other nodes it dials and which it accepts dials from. The mesh tolerates link failures — if a hop is down, traffic re-routes through alternative peers.

A typical small AAP install has one hybrid control node + two execution nodes + a hop node at the customer-VPC bridge — a topology that handles thousands of concurrent jobs and survives single-node loss. The clustered AAP install scales out by adding more control nodes (HA with a shared Postgres) and more execution nodes (just more workers).

Automation Hub

Hub is the private registry. It is a Pulp-based service that hosts collections in repositories:

Repository What’s in it
published Your internal collections — the ones your team writes and publishes via ansible-galaxy collection publish.
rh-certified Red Hat-certified collections (synced from console.redhat.com).
community Curated community collections (synced from galaxy.ansible.com, optional).
validated Red Hat validated content — opinionated, tested patterns for specific use cases (synced from RH).

A Namespace is the Hub equivalent of a GitHub organisation: my_namespace.my_collection. A namespace owner controls who can publish.

Content Signing: Hub signs every published collection with a configured signing key (sigstore-compatible). When ansible-galaxy collection install pulls from Hub, it verifies the signature against a public key stored locally or in a Controller GPG Public Key credential. Unsigned collections are rejected if signature_required is set. This is the supply-chain integrity story: a malicious collection cannot land on production unless it carries a signature from a key your CI trusts.

Remote sync lets Hub mirror upstream sources on a schedule. Configure a Remote (the upstream URL + auth + auth header) and a Repository (which content to pull). Schedule it. Air-gapped installs use this with an HTTP proxy that holds the Red Hat manifest token, fetching once into a staging Hub and never letting console.redhat.com reach production directly.

Consuming Hub from a CLI client (laptop, CI, Controller EE build) is just an ansible.cfg galaxy_server_list:

[galaxy]
server_list = my_hub, rh_certified, community

[galaxy_server.my_hub]
url = https://hub.example.com/api/galaxy/content/published/
token = ${HUB_TOKEN}

[galaxy_server.rh_certified]
url = https://hub.example.com/api/galaxy/content/rh-certified/
token = ${HUB_TOKEN}

[galaxy_server.community]
url = https://hub.example.com/api/galaxy/content/community/
token = ${HUB_TOKEN}

The order matters: ansible-galaxy collection install walks the list in order. Put your published collections first so an internal collection that shadows an upstream name (rare but legitimate) wins.

Event-Driven Ansible (EDA)

EDA is a separate component (the EDA Server) that runs rulebooks: long-running YAML processes that listen on sources, evaluate conditions, and fire actions. Architecturally it is pub/sub for Ansible.

A rulebook, dissected

# rulebooks/restart-on-pagerduty.yml
- name: Restart Postgres replica on PagerDuty alert
  hosts: all
  sources:
    - ansible.eda.webhook:
        host: 0.0.0.0
        port: 5000
        token: "{{ EDA_PAGERDUTY_TOKEN }}"
  rules:
    - name: High-severity alert on a prod_db host
      condition: >-
        event.payload.event.event_type == 'incident.triggered' and
        event.payload.event.data.incident.urgency == 'high' and
        event.payload.event.data.incident.tags is contains 'prod_db'
      throttle:
        once_within: 10 minutes
        group_by_attributes:
          - event.payload.event.data.incident.id
      action:
        run_job_template:
          name: "Restart Postgres Replica"
          organization: "Default"
          job_args:
            extra_vars:
              incident_id: "{{ event.payload.event.data.incident.id }}"
              host: "{{ event.payload.event.data.incident.tags | select('match', '^host:') | first | replace('host:', '') }}"

Every rulebook has three sections:

Sources — the eventing inputs. Each source is a long-running coroutine in the EDA Server. Common sources from ansible.eda:

Source What it listens on
ansible.eda.webhook A configurable HTTP endpoint; supports token for shared-secret auth.
ansible.eda.kafka Kafka topic; configurable group, offset, encryption.
ansible.eda.azure_service_bus Azure Service Bus queue/topic.
ansible.eda.aws_sqs AWS SQS queue.
ansible.eda.url_check Periodically GETs a URL; emits an event on status change.
ansible.eda.alertmanager Prometheus Alertmanager webhook.
ansible.eda.journald systemd journal entries matching a filter (local source).
ansible.eda.range A counter — useful for testing.
ansible.eda.generic Pre-canned events from a fixture file — for development and test.

Conditions — Jinja-shaped predicates over the event payload. Conditions can chain (and/or), reference nested fields (event.payload.foo.bar), use built-in operators (is contains, is regex, is search, is defined), and reference rule-local state (vars).

Actions — what to do when a condition fires. The action set:

Action What it does
run_playbook Execute a playbook locally inside the EDA Server’s worker.
run_job_template Launch a Controller Job Template (the production-grade action — the JT runs in Controller’s mesh, audited).
run_workflow_template Launch a Controller Workflow Template.
run_module Run a single Ansible module.
post_event Forward the event to another rulebook (chaining).
set_fact Stash a value in rulebook-local memory.
retract_fact Remove a fact.
print_event Log to the EDA Server (debugging).
debug Like print_event but with explicit message templating.
none Explicit no-op.
shutdown Stop the rulebook (rare; usually for graceful restart).

Throttling & de-duplication — EDA has built-in throttle: blocks (once_within, once_after, group_by_attributes) that prevent a noisy source from firing the same action a hundred times. Pair with group_by_attributes to scope the throttle key to the meaningful identity (an incident ID, a host name) rather than the whole rule. Without throttling, a flapping monitor will DoS your Controller.

EDA Server runs rulebooks as long-running processes managed via a rulebook activations API on the Controller. Rulebooks live in Decision Environments (DE) — an EE-shaped image that ships ansible-rulebook + the ansible.eda collection + your custom event sources. The DE is to EDA what an EE is to Controller.

Topology cheat-sheet

Scale Topology
Single-node lab/dev One hybrid node (control + execution) on an 8-vCPU/16-GB VM. Postgres co-located. AWX-on-kind works for laptop labs.
Small prod (< 50 concurrent jobs) One control node, two execution nodes, an external HA Postgres. One Hub, one EDA Server.
Medium prod (< 500 concurrent jobs) Three control nodes (HA cluster) behind a load balancer, four+ execution nodes per region/cloud, hop nodes at every network boundary, external HA Postgres + Redis. Two Hubs (active/standby across DCs).
Air-gapped enterprise All of the above plus a staging Hub in the connected DC that mirrors console.redhat.com, a one-way sync to the production Hub in the air-gapped DC. EEs and DEs built in the staging side, signed, then mirrored.

The Mesh’s flexibility is what makes any of these workable: control plane stays small (one HA cluster), execution scales linearly by adding nodes, hop nodes solve every network-segmentation problem.

Hands-on lab: AWX on a kind cluster

You can drive a real Controller end-to-end on a laptop with AWX and kind. No Red Hat subscription, no cloud bill.

1. Bring up kind.

brew install kind kubectl   # or use the project's apt/yum repos on Linux
kind create cluster --name awx
kubectl config use-context kind-awx

2. Install the AWX Operator.

kubectl apply -k "https://github.com/ansible/awx-operator/config/default?ref=2.x"
kubectl create namespace awx

3. Deploy AWX itself.

# awx.yaml
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
  namespace: awx
spec:
  service_type: nodeport
  nodeport_port: 30080
kubectl apply -f awx.yaml
kubectl -n awx get pods --watch     # wait until awx-... is Running (5–10 minutes)

4. Get the admin password.

kubectl -n awx get secret awx-admin-password -o jsonpath="{.data.password}" | base64 -d; echo

5. Open the UI.

kubectl -n awx port-forward svc/awx-service 8080:80
# now visit http://localhost:8080 and log in as admin / <the password>

6. Drive it from the CLI.

pip install awxkit
awx --conf.host http://localhost:8080 --conf.username admin \
    --conf.password "$(kubectl -n awx get secret awx-admin-password -o jsonpath='{.data.password}' | base64 -d)" \
    login
awx projects list
awx organizations list

7. Create a Project + Inventory + JT entirely from CLI.

# A public, well-known role repo to keep the lab free
awx projects create --name "demo" --organization Default --scm_type git \
  --scm_url https://github.com/ansible/ansible-tower-samples.git --scm_branch master \
  --wait

awx inventories create --name "localhost" --organization Default
awx hosts create --name "localhost" --inventory "localhost" \
  --variables '{"ansible_connection": "local"}'

awx job_templates create --name "demo-hello" --project "demo" \
  --inventory "localhost" --playbook hello_world.yml \
  --execution_environment "AWX EE (latest)" \
  --extra_vars '{"who": "Wave-2 student"}'

awx job_templates launch "demo-hello" --monitor

The --monitor flag streams the live job output to your terminal — same logs as the UI’s job pane. You just exercised the Controller object model end-to-end from a Linux/Mac shell.

8. Add a webhook-driven rulebook (EDA upstream, optional).

pip install ansible-rulebook ansible-runner
ansible-galaxy collection install ansible.eda
# rulebook.yml
- name: Trigger AWX JT on local webhook
  hosts: all
  sources:
    - ansible.eda.webhook:
        host: 0.0.0.0
        port: 5050
  rules:
    - name: any POST to /endpoint
      condition: event.endpoint == "endpoint"
      action:
        run_job_template:
          name: "demo-hello"
          organization: "Default"
EDA_CONTROLLER_URL=http://localhost:8080 \
EDA_CONTROLLER_TOKEN=<personal-token-from-AWX-UI> \
ansible-rulebook --rulebook rulebook.yml -i ansible/inventory.yml --verbose

# in another shell:
curl -X POST http://localhost:5050/endpoint -d '{"hello":"world"}' -H content-type:application/json

The rulebook fires, the action runs, AWX picks up the JT launch — the same pattern AAP’s EDA Server runs at scale, just managed manually here.

9. Cleanup.

kind delete cluster --name awx

You now have hands-on familiarity with every Controller object, the awx CLI, and an EDA rulebook firing into Controller — without spending a cent.

Common mistakes & troubleshooting

JT runs against the wrong inventory — you set ask_inventory_on_launch: true but the launcher accepted the default. Decide whether the JT’s inventory is fixed (most cases) or selectable (rare); fixed JTs are safer.

Project sync hangs — wrong scm_credential (no read access to the repo) or scm_branch doesn’t exist. Run awx projects update <name> --monitor and watch the trace.

Survey answers don’t reach the play — the survey variable name doesn’t match the play’s extra_vars reference (they must be exact). Surveys produce extra-vars, which are highest-precedence Ansible vars; review the variables-precedence lesson.

Mesh node “stuck pending” — the Receptor handshake failed (TLS cert mismatch, peer not in the allow-list). awx-manage list_instances and the Receptor logs on the node tell you why.

Hub install signature failure — your Controller’s signature_validation_credential references a GPG key that doesn’t match Hub’s signing key. Either re-publish the collection, or update the credential. Never disable signature validation in production.

EDA rulebook fires repeatedly — your condition: matches every event, not just the relevant one, and there is no throttle:. Tighten the condition; add a throttle: { once_within: <period>, group_by_attributes: [...] }.

“No execution capacity” — the JT’s Instance Group has no idle execution nodes. Either add nodes, or assign the JT to a different group with capacity. The awx instance_groups list command shows utilisation.

Click-ops drift — someone edited a JT in the UI; CI’s IaC reconciliation now overwrites it. Either lock the UI editing rights to platform admins only, or make the IaC reconciliation idempotent and accept the drift as a routine “platform owns this object” check.

Best practices

Security notes

AAP is the automation control plane. It holds privileged credentials for every system it touches; protect it like a privileged-access workstation.

Interview & exam questions

  1. What’s the difference between AAP, AWX, and Controller? AAP = the supported product; AWX = the upstream open-source project that becomes Controller; Controller = the AAP component that orchestrates jobs (formerly called Tower).
  2. What does an Execution Environment do? It’s the OCI image containing ansible-core + collections + Python deps that Controller runs every job inside; the contract between Controller and Ansible.
  3. Control vs hop vs execution nodes? Control = API + scheduler + DB; execution = runs jobs in EEs; hop = relays mesh traffic across network boundaries with no execution capacity.
  4. What is an Instance Group? A logical grouping of execution nodes; JTs target an Instance Group and the scheduler picks an idle node in it.
  5. What’s a Project, in Controller terms? A tracked SCM repo (usually git) from which JTs draw playbooks and configs.
  6. What does a Survey do? Prompts the JT launcher for parameters via a JSON-schema-shaped form; answers become extra-vars at runtime.
  7. What’s an Automation Hub Repository? A logical collection of collections — published (yours), rh-certified, community, validated. Each can be served, signed, and synced separately.
  8. How does content signing work? Hub signs every published collection with a sigstore-compatible key; clients verify against a public key (Controller’s GPG Public Key credential or ~/.ansible/galaxy_keys); unsigned content is rejected.
  9. What’s a rulebook, in EDA? A YAML file with sources (event inputs), conditions (predicates), and actions (run JTs/playbooks/modules); EDA Server runs it as a long-running process.
  10. Difference between run_playbook and run_job_template? run_playbook runs the playbook inside EDA Server’s worker. run_job_template calls Controller’s API to launch a JT — the production-grade path because the JT runs in the mesh, audited, with all the RBAC and credential handling.
  11. Why is throttling necessary in EDA? Without it, a noisy source (a flapping monitor) fires the same action repeatedly and overwhelms Controller. throttle: { once_within: <period>, group_by_attributes: [...] } scopes the dedup to the meaningful identity.
  12. What’s the awx CLI good for? Treating the Controller as code: every object can be CRUDed from a shell script or CI job; the UI is convenience, the API is the contract.
  13. How do you scale Controller for HA? Three control nodes behind a load balancer, an external HA Postgres, multiple execution nodes per region/cloud, hop nodes at every network boundary.
  14. What’s a Decision Environment? The EDA equivalent of an Execution Environment — the container image carrying ansible-rulebook + ansible.eda + your event sources.

Quick check

Exercise

Stand up the AWX-on-kind lab and:

  1. Define one Project, one Inventory, one Job Template entirely from the awx CLI (no UI clicks).
  2. Add a Survey to the JT prompting for target_env (drop-down: dev/staging) and app_version (regex ^\d+\.\d+\.\d+$).
  3. Build a Workflow Template that runs a preflight JT, an approval node, then the deploy JT, with a Slack notification on success/failure.
  4. Mirror a community collection (e.g. community.general) into your local Hub if you’ve stood one up; otherwise document the steps.
  5. Write a rulebook that listens on a webhook and fires run_job_template against your demo JT; throttle to once per minute per source IP.
  6. Verify that an unsigned collection cannot install when signature_required is set; sign it; verify it does install.
  7. Document in your repo’s README.md the exact commands a new engineer types to recreate the AWX state from scratch (the IaC story).

Certification mapping

Glossary

Next steps

You now have the architecture model of AAP: how Controller, Hub, EDA, and the Mesh fit together; what every Controller object is for; how rulebooks turn events into runs; and how the platform’s supply-chain story (signed collections in Hub, Decision Environments for EDA, Execution Environments for Controller) anchors trust in production. The platform is now a substrate you can reason about — but the substrate only matters when fed by quality content. The natural sequels close that loop: ship tested, packaged content to Hub by combining the Molecule lesson (proof of correctness) with the collections & EE lesson (the build-and-publish path); feed the platform with live fleets by using the patterns from Dynamic Inventory & Secure Secrets inside Controller’s Inventory Sources; and operate at scale with Tuning Ansible for Speed & Scale and Delegation, Strategies & Rolling Updates — the levers that compose into Workflow Templates that complete in minutes rather than hours. AAP is where every other Advanced lesson cashes in.

ansibleaapawxcontrollerautomation-hubevent-driven-ansibleEX374
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments