Ansible-on-AWS confuses people who have already met Terraform. The two tools overlap, but they answer different questions. Terraform asks “what does my AWS estate look like, as code?” — its job is to converge a desired state across hundreds of resources, with a state file that tracks every dependency. Ansible asks “what should this thing do, right now?” — its job is to make targeted, often imperative changes to live infrastructure: drain an instance, snapshot an RDS database, rotate a security group rule because an alert just fired, run a one-off migration, deploy an application into boxes that already exist. Both are correct; they live at different altitudes. The job of this lesson is to teach you the AWS modules deeply enough that you stop reaching for the AWS CLI in shell: tasks (the most common anti-pattern in Ansible-on-AWS) and start using the native modules that are idempotent, check-mode-aware, and --diff-friendly.
We start by drawing the Ansible vs Terraform line cleanly so you know when to reach for which tool, then walk the amazon.aws and community.aws collections module-by-module: the EC2 family (ec2_instance, ec2_vpc_net, ec2_vpc_subnet, ec2_vpc_route_table, ec2_security_group, elb_application_lb, autoscaling_group), the data-plane modules (s3_bucket, s3_object, rds_instance, rds_cluster), and the IAM/identity modules (iam_role, iam_user, iam_policy). We cover the AWS auth chain end to end — environment variables (AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY/AWS_SESSION_TOKEN), ~/.aws/credentials profiles, IAM Roles for EC2 when the control node is an EC2 instance, IRSA when it is a pod inside EKS, AWS SSO, and STS assume-role chains for multi-account access — and the cookbook patterns each one needs. We re-meet the amazon.aws.aws_ec2 dynamic inventory plugin from a deeper angle than the dynamic inventory lesson, focusing on the AWS-specific knobs (include_filters, iam_role_arn, regions: [aws-global], hostnames with tag:Name/private-ip-address/public-dns-name, keyed_groups per tags.Environment/placement.availability_zone/instance_type). We finish on multi-account patterns with assume_role_arn, tagging strategy that turns a 2,000-instance fleet into manageable groups, idempotency and check-mode behaviour for the awkward AWS modules (ec2_instance in particular), and packaging an AWS-aware Execution Environment for AAP. Everything targets current Ansible (ansible-core 2.17+, the amazon.aws 8+ and community.aws 8+ collections, 2026), uses FQCN throughout, and ends with a free hands-on lab that uses LocalStack so you can drive real amazon.aws.* modules without a real AWS bill.
Learning objectives
After this lesson you can:
- Articulate exactly when Ansible-on-AWS beats Terraform and vice versa, and stop the “should I use Ansible or Terraform here?” debate at the door.
- Drive the EC2/VPC/IAM/RDS/S3 module families with full option matrices and idempotent behaviour.
- Plumb AWS credentials safely with env vars, profiles, IAM Roles for EC2, IRSA, AWS SSO, and STS assume-role.
- Configure the
aws_ec2dynamic inventory plugin with the AWS-specific knobs, and explain when it caches and when it doesn’t. - Operate a multi-account AWS estate with one playbook by passing
assume_role_arn(or per-taskaws_profile). - Use a tagging strategy so your inventory fans out into clean cross-cutting groups (
tag_Environment_prod,tag_Role_web,az_eu_west_1a). - Ship an AWS-aware Execution Environment that includes
boto3,botocore, the two collections, and the AWS CLI for the fewcommand:cases that need it.
Prerequisites & where this fits
You should already be comfortable with playbooks and tasks, variables and the precedence rules, Jinja templating, roles and collections, and the dynamic inventory lesson (because every real AWS run uses dynamic inventory). The companion expert lessons that compound here are Ansible for Kubernetes — many AWS plays target EKS — Ansible for Containers — for ECR/ECS/Fargate adjacency — and Hybrid Orchestration — when AWS is one of three or four targets in a single workflow. In the Ansible Zero-to-Hero programme this is the Cloud expert (AWS) lesson and a textbook EX374-grade topic.
Core concepts
Five mental models carry the whole lesson.
1. Ansible-on-AWS is operations, not provisioning. Terraform builds the estate and tracks state; Ansible operates it. The ideal split is “Terraform builds VPC + subnets + EKS + RDS skeleton” then “Ansible runs every day to deploy apps, rotate keys, snapshot databases, scale ASGs, drain instances, react to events.” If you find yourself building a 200-resource VPC in pure Ansible, stop — you’ve crossed into Terraform’s lane and you’re losing state, drift detection, and dependency-graphing for nothing.
2. The AWS auth chain is shared with boto3. Every amazon.aws.* module ultimately constructs a boto3 client, so the credential resolution order is the standard AWS SDK chain: explicit module params → env vars → ~/.aws/credentials profile → instance metadata (IMDSv2) for EC2 → IRSA for EKS pods → SSO/SSO-OIDC. You almost never put credentials in module args. You configure the environment and let the chain resolve.
3. STS assume-role is the multi-account primitive. A single playbook running with one set of base credentials can hop into 20 AWS accounts by setting assume_role_arn per task or per host. The pattern is “one centralised automation account, with an IAM role in every spoke account that trusts it.” Ansible’s job is to call sts:AssumeRole per target and use the temporary credentials.
4. Tags are the inventory. aws_ec2’s keyed_groups turns every tag and every cloud field into an Ansible group. A consistent tag schema (Environment, Role, Owner, CostCenter) is what turns “all hosts” into clean cross-cutting groups (tag_Environment_prod, tag_Role_web). The minute your tag schema is inconsistent, your dynamic inventory becomes useless.
5. EC2 is not a regular Ansible target — it is both a target and a thing-you-create. Most of this lesson lives in the second world: ec2_instance creates an instance. Once it exists, Ansible’s normal SSH model targets it (via aws_ec2 inventory). Don’t confuse the two phases — ec2_instance runs from localhost (with connection: local) against the AWS API; later plays run with connection: ssh against the running instance. The transition between the two is what wait_for_connection handles.
Keep these terms straight: amazon.aws (Red-Hat-supported AWS collection — the one you should default to), community.aws (community-maintained extras — aws_eks_cluster_info, etc.), boto3/botocore (the Python SDK every module uses — must be installed in the EE), the SDK auth chain (env → profile → instance role → IRSA → SSO), assume_role_arn (per-task multi-account hop), aws_ec2 plugin (dynamic inventory; lives in amazon.aws), IAM Role for EC2 (control-node identity for self-hosted), IRSA (control-node identity for EKS-hosted), connection: local (used for every AWS API task — the work runs on the control node, not on a target).
amazon.aws vs community.aws
Two collections cover ~all of AWS:
| Collection | Scope | Support | Default? |
|---|---|---|---|
amazon.aws |
Core AWS services (EC2, VPC, IAM, RDS, S3, Route53, ELB, ASG, KMS, Lambda, CloudFront) | Red-Hat-supported in AAP | Yes — install first |
community.aws |
Long-tail and newer services (EKS, ECS, MSK, MWAA, Glue, Athena, SES, SNS extras) | Community-maintained | Yes — install alongside |
Both come from the same upstream organisation; the split exists so the supported core remains stable while community modules iterate fast. Install both:
# requirements.yml
collections:
- name: amazon.aws
version: ">=8.0.0"
- name: community.aws
version: ">=8.0.0"
ansible-galaxy collection install -r requirements.yml
pip install boto3 botocore
The AWS auth chain
This is the single most important table in the lesson. The order Ansible (via boto3) resolves credentials:
| Order | Source | Where it shines |
|---|---|---|
| 1 | Explicit module params (aws_access_key, aws_secret_key, security_token) |
Avoid — only for one-off scripts |
| 2 | Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, AWS_REGION) |
CI runners, lab boxes |
| 3 | Shared credentials file ~/.aws/credentials (profile via AWS_PROFILE or aws_profile: module param) |
Engineers’ laptops, multi-account hopping |
| 4 | EC2 Instance Metadata (IMDSv2) — when the control node is an EC2 instance with an instance role | Self-hosted AAP / Controller running on EC2 |
| 5 | IRSA (IAM Roles for Service Accounts) — when the control node is a pod in EKS | Container Group execution in EKS-hosted AAP |
| 6 | AWS SSO (AWS_PROFILE pointing at an SSO-cached profile) |
Engineering laptops in SSO-only orgs |
| 7 | STS AssumeRoleWithWebIdentity (OIDC) — for GitHub Actions / GitLab CI federated identities | Cloud-native CI without long-lived keys |
The rule of thumb in production: use the lowest-numbered source that doesn’t require a static secret. Order 4 (instance role) and order 5 (IRSA) are the gold standards because there are no keys on disk.
Pattern A — control node has an instance role
Run AAP on EC2 with an instance role. Every amazon.aws.* task uses IMDSv2 transparently. Zero credentials in inventory or vault.
- name: Provision a security group (uses instance-role creds)
amazon.aws.ec2_security_group:
name: web-sg-prod
description: "Web tier"
vpc_id: vpc-0abc123
region: eu-west-1
rules:
- proto: tcp
ports: [443]
cidr_ip: 0.0.0.0/0
state: present
delegate_to: localhost
connection: local
Pattern B — IRSA for EKS-hosted Container Groups
AAP Container Groups can be configured so the EE pod’s ServiceAccount is annotated with an IAM Role ARN. The pod’s pod-identity webhook injects AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE, and boto3 picks them up. Same playbook, no code change.
Pattern C — multi-account assume-role
The control node holds base credentials (instance role or IRSA). Per task, hop into the account you want:
- name: Snapshot RDS in dev account
amazon.aws.rds_cluster_snapshot:
db_cluster_identifier: dev-app
db_cluster_snapshot_identifier: dev-app-{{ ansible_date_time.iso8601_basic_short }}
region: eu-west-1
assume_role:
role_arn: arn:aws:iam::111122223333:role/AnsibleAutomation
role_session_name: ansible-snap-dev
delegate_to: localhost
connection: local
For long plays in one account, set the role at the play level via env vars + community.aws.sts_assume_role:
- name: Hop into account 111122223333 once
community.aws.sts_assume_role:
role_arn: arn:aws:iam::111122223333:role/AnsibleAutomation
role_session_name: ansible-{{ ansible_date_time.iso8601_basic }}
register: assumed
- name: Run all subsequent tasks in that account
amazon.aws.ec2_instance_info:
region: eu-west-1
environment:
AWS_ACCESS_KEY_ID: "{{ assumed.sts_creds.access_key }}"
AWS_SECRET_ACCESS_KEY: "{{ assumed.sts_creds.secret_key }}"
AWS_SESSION_TOKEN: "{{ assumed.sts_creds.session_token }}"
EC2 family — the headline modules
| Module | Purpose | Idempotent? | Check-mode? |
|---|---|---|---|
amazon.aws.ec2_instance |
Create / modify / terminate EC2 instances | Yes (when you set name: or instance_ids:) |
Yes |
amazon.aws.ec2_vpc_net |
VPCs | Yes | Yes |
amazon.aws.ec2_vpc_subnet |
Subnets | Yes | Yes |
amazon.aws.ec2_vpc_route_table |
Route tables | Yes | Yes |
amazon.aws.ec2_security_group |
SGs (with rule diffing) | Yes | Yes |
amazon.aws.ec2_key |
Key pairs | Yes | Yes |
amazon.aws.elb_application_lb |
Application Load Balancers | Yes | Yes |
amazon.aws.autoscaling_group |
ASGs (with replace_all_instances, replace_batch_size) |
Yes | Partial |
amazon.aws.ec2_ami |
AMIs (create from instance, share, deregister) | Yes | Yes |
A canonical instance-creation task with wait_for_connection:
- name: Launch a web instance
amazon.aws.ec2_instance:
name: web-eu-1a-{{ deploy_id }}
region: eu-west-1
image_id: "{{ web_ami_id }}"
instance_type: t3.medium
vpc_subnet_id: "{{ subnet_eu_1a }}"
security_groups: [web-sg-prod]
key_name: ops
tags:
Environment: prod
Role: web
Owner: platform
Deploy: "{{ deploy_id }}"
state: running
wait: true
wait_timeout: 300
delegate_to: localhost
connection: local
register: launched
- name: Add to the in-memory inventory and wait for SSH
ansible.builtin.add_host:
name: "{{ item.public_ip_address | default(item.private_ip_address) }}"
groups: just_launched
ansible_user: ec2-user
loop: "{{ launched.instances }}"
delegate_to: localhost
- name: Wait for SSH on each new box
ansible.builtin.wait_for_connection:
timeout: 300
delegate_to: "{{ item }}"
loop: "{{ groups['just_launched'] }}"
Notice the pattern: API tasks run delegate_to: localhost + connection: local; once a box exists, Ansible switches to its real connection.
IAM, S3, RDS
- name: IAM role for the web tier
amazon.aws.iam_role:
name: web-instance-role
assume_role_policy_document: "{{ lookup('file', 'trust/web.json') }}"
managed_policies:
- AmazonSSMManagedInstanceCore
- CloudWatchAgentServerPolicy
state: present
delegate_to: localhost
- name: S3 bucket for app logs (with lifecycle)
amazon.aws.s3_bucket:
name: prod-app-logs
region: eu-west-1
versioning: true
public_access:
block_public_acls: true
block_public_policy: true
ignore_public_acls: true
restrict_public_buckets: true
encryption: AES256
tags:
DataClass: logs
state: present
delegate_to: localhost
- name: Postgres RDS for staging
amazon.aws.rds_instance:
id: stg-app-db
engine: postgres
engine_version: "16.3"
db_instance_class: db.t4g.medium
allocated_storage: 50
storage_type: gp3
master_username: app
master_user_password: "{{ vault_rds_pw }}"
db_subnet_group_name: stg-db
vpc_security_group_ids: ["{{ db_sg }}"]
backup_retention_period: 7
deletion_protection: true
state: present
region: eu-west-1
delegate_to: localhost
aws_ec2 dynamic inventory — AWS-specific knobs
The cross-cutting plugin schema (hostnames, compose, keyed_groups, groups, strict) is covered in the dynamic inventory lesson. The AWS-specific knobs:
# inventory/prod.aws_ec2.yml
plugin: amazon.aws.aws_ec2
# Multi-region in one inventory file
regions:
- eu-west-1
- eu-west-2
- us-east-1
# Multi-account via assume-role (per-source!)
iam_role_arn: arn:aws:iam::111122223333:role/AnsibleInventory
# Pre-filter on the AWS side — saves API calls
include_filters:
- tag:Environment: ["prod"]
- instance-state-name: ["running"]
exclude_filters:
- tag:Decommission: ["true"]
# Hostnames priority (first that exists wins)
hostnames:
- tag:Name
- private-dns-name
# Useful per-host variables
compose:
ansible_host: private_ip_address
env: tags.Environment | default('unknown')
role: tags.Role | default('unknown')
cost_center: tags.CostCenter | default('unknown')
# Cross-cutting groups
keyed_groups:
- key: tags.Role
prefix: role
- key: tags.Environment
prefix: env
- key: placement.availability_zone
prefix: az
- key: instance_type
prefix: type
- key: vpc_id
prefix: vpc
# Named groups via expressions
groups:
prod_eu: tags.Environment == 'prod' and placement.region.startswith('eu-')
needs_patch: tags.Patched is not defined or tags.Patched != 'true'
# Caching — critical for big fleets
cache: true
cache_plugin: jsonfile
cache_connection: /var/cache/ansible_inventory
cache_timeout: 600
include_filters is the most performance-impactful knob: it’s a server-side filter, so a 10,000-instance account becomes a 200-result query. Without it, every play hits the AWS API for every instance.
Multi-account inventory — one file per account
# inventory/account-prod.aws_ec2.yml
plugin: amazon.aws.aws_ec2
regions: [eu-west-1, us-east-1]
iam_role_arn: arn:aws:iam::111111111111:role/AnsibleInventory
hostnames: [tag:Name]
keyed_groups:
- { key: "'prod'", prefix: account }
- { key: tags.Role, prefix: role }
# inventory/account-stg.aws_ec2.yml
plugin: amazon.aws.aws_ec2
regions: [eu-west-1]
iam_role_arn: arn:aws:iam::222222222222:role/AnsibleInventory
hostnames: [tag:Name]
keyed_groups:
- { key: "'stg'", prefix: account }
- { key: tags.Role, prefix: role }
Then point ansible.cfg at the directory; both files are merged automatically:
[defaults]
inventory = ./inventory/
[inventory]
enable_plugins = amazon.aws.aws_ec2, amazon.aws.aws_rds, constructed
Tagging strategy that scales
A consistent tag schema is the single highest-leverage thing you can do for Ansible-on-AWS:
| Tag | Required | Purpose |
|---|---|---|
Environment |
Yes | prod/stg/dev — drives keyed_groups |
Role |
Yes | web/db/worker — what plays it gets |
Owner |
Yes | Team email or Slack channel |
CostCenter |
Yes | Finance attribution |
Project |
Recommended | Cross-cutting |
PatchGroup |
Recommended | Drives Systems Manager patch baselines |
BackupPolicy |
Recommended | Drives data lifecycle plays |
Deploy |
Conditional | The deploy ID that created this instance — for blue/green |
Enforce the schema with AWS Config rules (built-in required-tags) so an instance without Environment is non-compliant within minutes of launch. Ansible relies on the tags being present and correct; AWS Config makes that contract enforceable.
Idempotency & check-mode for the awkward modules
| Module | Idempotency mechanism | Check-mode behaviour |
|---|---|---|
ec2_instance |
Matches by name: (tag Name) or instance_ids: |
--check: shows would-launch / would-modify |
ec2_security_group |
Diffs rule list per direction | --check: shows rule add/remove |
s3_bucket |
Idempotent on name | --check: skips most knobs |
rds_instance |
id: is the key |
--check: limited |
autoscaling_group |
Idempotent on name: |
--check: partial — replace_all_instances is a destructive runtime action |
The two known sharp edges:
ec2_instancewithoutname:is not idempotent — every run launches a new instance. Always set a uniquename:(typically<role>-<az>-<deploy_id>).autoscaling_groupwithreplace_all_instances: trueis a real action; check-mode cannot fully simulate it. Always test in stg first.
Hands-on free lab — LocalStack
LocalStack is a fake AWS that runs in Docker. The amazon.aws.* modules treat it as real AWS via an endpoint override.
docker run -d --name localstack -p 4566:4566 -e SERVICES=ec2,s3,iam,rds,sts localstack/localstack
# expose the override to boto3
export AWS_ACCESS_KEY_ID=test
export AWS_SECRET_ACCESS_KEY=test
export AWS_DEFAULT_REGION=eu-west-1
export AWS_ENDPOINT_URL=http://localhost:4566
mkdir aws-lab && cd aws-lab
ansible-galaxy collection install amazon.aws community.aws
pip install boto3 botocore
cat > play.yml <<'EOF'
- hosts: localhost
gather_facts: false
tasks:
- name: VPC
amazon.aws.ec2_vpc_net:
name: lab-vpc
cidr_block: 10.42.0.0/16
state: present
register: vpc
- name: Subnet
amazon.aws.ec2_vpc_subnet:
vpc_id: "{{ vpc.vpc.id }}"
cidr: 10.42.1.0/24
az: eu-west-1a
tags: { Name: lab-subnet }
state: present
register: subnet
- name: SG
amazon.aws.ec2_security_group:
name: lab-sg
description: lab
vpc_id: "{{ vpc.vpc.id }}"
rules:
- proto: tcp
ports: [22, 80]
cidr_ip: 0.0.0.0/0
state: present
- name: S3 bucket
amazon.aws.s3_bucket:
name: lab-bucket-{{ 9999 | random }}
state: present
- name: Show what we made
ansible.builtin.debug:
msg: "VPC {{ vpc.vpc.id }} / subnet {{ subnet.subnet.id }}"
EOF
ansible-playbook play.yml --diff
ansible-playbook play.yml --diff # second run — changed=0
Tear down:
docker rm -f localstack
Common mistakes & troubleshooting
ImportError: No module named boto3. The Execution Environment doesn’t have boto3/botocore installed. Bake them in via the EE’s python_requirements (or pip install in your venv for local dev).
Credentials work in aws s3 ls but not in Ansible. You’re using SSO and Ansible’s process didn’t inherit the profile. Run aws sso login --profile X then export AWS_PROFILE=X in the same shell you run Ansible from.
ec2_instance keeps creating new instances. You forgot the name: (or instance_ids:) parameter. Without an identity key, the module is not idempotent.
Inventory returns 0 hosts. Either: (a) enable_plugins doesn’t include amazon.aws.aws_ec2; (b) the file isn’t named *.aws_ec2.yml; © include_filters excludes everything; (d) credentials are wrong (no sts:GetCallerIdentity permission to even list).
assume_role_arn works in CLI but Ansible says AccessDenied. The role’s trust policy must allow sts:AssumeRole from the Ansible automation account’s role/user, not the user’s own ARN.
autoscaling_group with replace_all_instances: true triggered an outage in stg. It’s a real action; it terminates instances. Use replace_batch_size: 1 and health_check_grace_period: 300, or do blue/green by creating a new ASG and shifting the ALB target group over.
Slow inventory. No cache:, no include_filters. Add both. A 1000-instance account with caching disabled hits AWS APIs on every play.
shell: aws ec2 describe-instances … everywhere. This is the cardinal sin. Replace every one with amazon.aws.ec2_instance_info. The module is idempotent, returns structured data, supports check-mode, and is --diff-friendly.
Best practices
- Pick the line. Terraform builds the estate; Ansible operates it. If you’re writing Ansible to provision a 50-resource VPC, you’ve crossed the line.
- Use the SDK auth chain. Never put long-lived keys in module args. Prefer instance role / IRSA / assume-role.
- Always set
name:onec2_instance. No exceptions. - Always
delegate_to: localhost+connection: localon AWS API tasks. (Or set it once at the play level.) - Tag schema first. Enforce with AWS Config. Inventory follows.
- Cache the inventory.
cache: truewith a 5-10 minute timeout on a directory cache plugin. - One inventory file per account. Merge via
inventory:directory. - Pin collection versions.
amazon.aws 8.xwill not break your plays mid-flight. - Build an AWS EE with
boto3,botocore,awscli,amazon.aws,community.aws. Pin AAP job templates to it. - Mesh execution nodes inside the VPC. Latency to the AWS regional endpoint matters; cross-region from on-prem will time out on big fleets.
Security notes
- No long-lived access keys. Instance role, IRSA, SSO, and STS assume-role are the supported patterns. Long-lived keys are a finding in any modern audit.
- Enable IMDSv2 on every instance you create (
metadata_options: { http_tokens: required }); the legacy IMDSv1 is exploitable from in-pod processes. - Tag-based authorisation in IAM is your friend: write the role policy so
ec2:*is allowed only on resources taggedOwner == ${aws:PrincipalTag/Team}. Ansible’s tag schema becomes the security boundary. - CloudTrail every assume-role call. Set
role_session_name:to a meaningful per-job string; that string lands in CloudTrail and gives you “which AAP job hopped into this account” forensics. - Vault any RDS / DocumentDB / ElastiCache passwords with Ansible Vault or AAP credential plugins backed by Secrets Manager.
- Never pass
aws_secret_keyas a module param — it ends up in process args and (worse) in Ansible’s verbose log output. Use the env / profile / instance-role chain. - Block public-S3 by default.
public_access:block on everys3_buckettask. Make exceptions explicit and code-reviewed. - Air-gap-friendly EE. Build the AWS EE locally, push to Private Automation Hub, pin AAP by digest. EKS-hosted execution pods can pull through ECR.
Interview & exam Q&A
Q1. When does Ansible-on-AWS beat Terraform? For operations: deployments, snapshots, key rotation, draining, event-driven response. Terraform converges desired state; Ansible operates the running estate. The two compose — Terraform builds, Ansible runs.
Q2. What’s the recommended way to authenticate Ansible to AWS in production? The lowest-friction credential-free path that fits the host: instance role (Controller on EC2), IRSA (Container Group on EKS), SSO (laptops), assume-role (multi-account hops). Long-lived access keys are an audit finding.
Q3. How do you run one playbook across 20 AWS accounts?
Each task (or each play) sets assume_role_arn to a per-account AnsibleAutomation role whose trust policy allows the central automation account’s principal. Or use community.aws.sts_assume_role once and inject the credentials via environment: for subsequent tasks.
Q4. Why is ec2_instance not idempotent without name:?
The module identifies an existing instance by its name: tag (or explicit instance_ids:). Without either, every run launches a new instance — there is no way for the module to “find” the previous one.
Q5. Difference between amazon.aws and community.aws?
amazon.aws is Red-Hat-supported and covers the core services (EC2/VPC/IAM/RDS/S3/ELB/ASG/Lambda/Route53). community.aws is community-maintained and covers the long tail (EKS, ECS, MSK, Glue, Athena, MWAA). You install both.
Q6. How does the aws_ec2 inventory plugin save API calls on big fleets?
Two levers: include_filters does a server-side filter (e.g. only tag:Environment=prod), so a 10,000-instance account becomes a 200-result query; cache: true with a jsonfile cache plugin and a 5-minute timeout means subsequent plays don’t hit the API at all until the cache expires.
Q7. Why must AWS API tasks delegate_to: localhost + connection: local?
Because there is no “host” to connect to — you’re calling an HTTPS API. The work runs on the control node. Set connection: local at the play level if every task in the play is an API task; per-task delegation is for mixed plays.
Q8. What’s the difference between IAM Roles for EC2 and IRSA? IAM Roles for EC2 attach an IAM role to an EC2 instance; processes on the instance get temporary creds via IMDSv2. IRSA (IAM Roles for Service Accounts) attaches an IAM role to a Kubernetes ServiceAccount in EKS; pods using that ServiceAccount get temporary creds via OIDC + STS. Both produce credential-free identity.
Q9. How do you write Ansible against multiple AWS regions safely?
Either region: per task (clean, explicit), or set AWS_DEFAULT_REGION per play via environment:. Avoid setting it globally in ansible.cfg; per-play scoping is what lets one playbook hit multiple regions.
Q10. What’s a sane tag schema?
Required: Environment, Role, Owner, CostCenter. Recommended: Project, PatchGroup, BackupPolicy. Enforce with AWS Config rules. Make untagged instances literally non-compliant within minutes of launch.
Q11. How does check-mode behave for autoscaling_group?
Partial — it can show the would-be config diff, but replace_all_instances is a real runtime action and check-mode cannot fully simulate it. Always test in stg, never replace_all in prod from a fresh playbook.
Q12. How do you handle EKS pod-level credentials when AAP runs Container Groups in EKS?
The Container Group config sets a ServiceAccount that’s annotated with eks.amazonaws.com/role-arn. The pod-identity webhook injects AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE; boto3 picks them up. No code change needed.
Q13. What’s the safest way to manage RDS passwords?
Don’t store them in playbooks. Either Vault-encrypt them, or (better) generate them at instance creation, write to AWS Secrets Manager, and have rds_instance reference the Secrets Manager-backed variable via a lookup plugin or AAP credential plugin.
Q14. How do you build a production AWS EE?
ansible-builder with dependencies.yml listing amazon.aws, community.aws; requirements.txt listing boto3, botocore, awscli; system deps (gcc, python3-devel for any C extensions). Push to Private Automation Hub, sign, pin AAP job templates by digest.
Quick check
- Which two collections cover ~all of AWS?
- What is the boto3 credential resolution order Ansible inherits?
- What single tag schema lever has the biggest impact on inventory clarity?
- Why do all AWS API tasks need
connection: local? - How do you make
ec2_instanceidempotent?
(Answers: amazon.aws + community.aws; explicit args → env → profile → IMDSv2 → IRSA → SSO → AssumeRoleWithWebIdentity; Environment (or any single, consistently-applied tag); because there’s no SSH target — the work is an HTTPS API call running on the control node; set name: (or instance_ids:) on every call.)
Exercise
Stand up the LocalStack lab. Then:
- Build a
prod.aws_ec2.ymlinventory withregions,include_filters(tag Environment=prod, instance-state running), fullkeyed_groupsfor Role/Environment/AZ/instance_type, and caching. - Write a play that creates a VPC, subnet, SG, and 2 instances tagged
Role=web,Environment=prod. Usedelegate_to: localhostproperly. - Add a follow-up play targeting
tag_Role_webfrom the dynamic inventory; have it runwait_for_connectionthenansible.builtin.debugtheinventory_hostname. - (Stretch) Add
community.aws.sts_assume_roleat the start of the play (against your own account for the lab; the API call still works locally) and run subsequent tasks with the assumed creds. - Run with
--check --diff. Then for real. Then again —changed=0.
Certification mapping
| Cert | Coverage |
|---|---|
| EX374 — Red Hat Certified Specialist in Ansible Automation | Direct: cloud collections, dynamic inventory, EE, AAP integration. |
| AWS Certified Solutions Architect — Associate | Indirect: VPC/IAM/RDS/S3 mental model. |
| AWS Certified DevOps Engineer — Professional | Direct: deployment automation, AMI baking, ASG operations. |
| HashiCorp Certified: Terraform Associate | Indirect (the line you must internalise: Terraform builds, Ansible runs). |
Glossary
amazon.aws— Red-Hat-supported core AWS collection.community.aws— community-maintained AWS extras.boto3/botocore— the Python SDK every AWS module uses; must be in the EE.- SDK auth chain — explicit args → env vars → profile → IMDSv2 → IRSA → SSO → AssumeRoleWithWebIdentity.
- IAM Role for EC2 — instance identity via IMDSv2.
- IRSA — IAM Roles for Service Accounts; pod identity in EKS via OIDC.
assume_role_arn— multi-account hop primitive; per-task or per-play.aws_ec2plugin — dynamic inventory plugin inamazon.aws.include_filters— server-side filter on theaws_ec2plugin (the most performance-impactful knob).connection: local— used for AWS API tasks; the work runs on the control node.
Next steps
You can now drive AWS from Ansible. The same shape — collection + auth chain + dynamic inventory + tagging strategy — repeats for the other clouds. Continue with Ansible for Azure and Ansible for GCP, then Ansible for Kubernetes for EKS-native ops, and finally Hybrid Multi-Cloud Orchestration to compose all three in a single workflow.