Most teams treat KMS as a checkbox: tick “encryption at rest,” pick the AWS-managed key, move on. That works until you need cross-Region DR, cross-account data sharing, a key you can prove only one role can use, or 100k decrypts a second on a hot path. At that point KMS stops being a checkbox and becomes an authorization system with a latency budget and a request quota. This guide treats it that way — the key types, how envelope encryption actually moves bytes, multi-Region keys, the three-layer authorization model, and the operational edges (rotation, quotas, audit) that bite at scale.
The one mental model: KMS never encrypts your data
The single most important fact about KMS: your plaintext almost never goes to KMS, and KMS keys never leave KMS. A KMS key (the CMK, formally a “KMS key”) is a logical reference to key material that lives inside FIPS 140-3 validated HSMs. You cannot export it. What you can do is ask KMS to wrap and unwrap small blobs — and the standard pattern is to have KMS wrap a data key that you use locally to encrypt the actual payload. That is envelope encryption, and nearly everything below is a consequence of it.
Two API verbs anchor the whole service:
Encrypt/Decrypt— send up to 4 KB of plaintext/ciphertext; KMS does the crypto. Fine for small secrets, wrong for large objects.GenerateDataKey— KMS mints a fresh symmetric data key, returns it to you both in plaintext and wrapped under your KMS key. You encrypt your gigabytes locally with the plaintext key, then throw the plaintext copy away and store only the ciphertext blob.
Internalize this: KMS is a wrapping and authorization service, not a bulk cipher. Every design decision — quotas, caching, multi-Region portability — falls out of “KMS protects the key that protects the data.”
1. Key types: pick the right primitive
KMS keys are not interchangeable. The KeySpec and KeyUsage are immutable at creation, so this is a one-way door.
| Type | KeySpec | Use it for | Notes |
|---|---|---|---|
| Symmetric | SYMMETRIC_DEFAULT (AES-256-GCM) |
Envelope encryption, the default for S3/EBS/RDS/Secrets Manager | Single key, never leaves KMS, supports GenerateDataKey |
| Asymmetric (encrypt) | RSA_2048/3072/4096 |
Encrypt where the encryptor has no AWS creds | Public key downloadable; you encrypt offline, KMS decrypts |
| Asymmetric (sign) | ECC_NIST_P256, RSA_* |
Code/document signing, verification by external parties | Sign / Verify |
| HMAC | HMAC_256/384/512 |
MACs, signed tokens, deterministic integrity checks | GenerateMac / VerifyMac |
| Multi-Region | any of the above + MultiRegion: true |
DR, global tables, cross-Region ciphertext portability | Replicable; shares key material across Regions |
Orthogonal to spec is who manages the key:
- AWS-owned keys — invisible, free, shared across accounts, no key policy you can see. Acceptable only when you have zero audit or access-control requirements.
- AWS-managed keys (
aws/s3,aws/ebs, …) — visible in your account, auto-rotated yearly, but you cannot edit their key policy and they cannot be used cross-account. The moment you need either, you are forced to a customer-managed key. - Customer-managed keys (CMKs) — you own the policy, rotation cadence, grants, tags, and deletion. This is the only type worth architecting around. Everything below assumes CMKs.
# A customer-managed symmetric key, with rotation on from day one
aws kms create-key \
--description "app-prod data-at-rest" \
--key-spec SYMMETRIC_DEFAULT \
--key-usage ENCRYPT_DECRYPT \
--tags TagKey=env,TagValue=prod TagKey=app,TagValue=payments
# Give it a human-friendly alias (aliases are Region-scoped, mutable pointers)
aws kms create-alias \
--alias-name alias/payments-prod \
--target-key-id <key-id>
2. Envelope encryption: data keys and the SDK
For anything larger than 4 KB, you encrypt locally with a data key. The raw flow:
# 1. Mint a data key: plaintext + ciphertext (wrapped) come back together
aws kms generate-data-key \
--key-id alias/payments-prod \
--key-spec AES_256 \
--query '{plaintext:Plaintext, wrapped:CiphertextBlob}' \
--output json
# 2. Encrypt the payload locally with `plaintext` (AES-256-GCM in your app)
# 3. Persist the ciphertext payload + `wrapped` blob; ZERO the plaintext key in memory
# 4. To read: Decrypt(wrapped) -> plaintext key -> decrypt payload locally
Rolling your own framing (IV, AAD, key blob, algorithm tags) is where teams introduce vulnerabilities. Use the AWS Encryption SDK — it produces a portable, self-describing message format that bundles the wrapped data key with the ciphertext, handles authenticated encryption, and supports multiple wrapping keys. A Python sketch:
import aws_encryption_sdk
from aws_encryption_sdk import CommitmentPolicy
from aws_encryption_sdk.identifiers import AlgorithmSuite
client = aws_encryption_sdk.EncryptionSDKClient(
commitment_policy=CommitmentPolicy.REQUIRE_ENCRYPT_REQUIRE_DECRYPT
)
key_provider = aws_encryption_sdk.StrictAwsKmsMasterKeyProvider(
key_ids=["arn:aws:kms:eu-west-1:111122223333:key/<key-id>"]
)
ciphertext, header = client.encrypt(
source=plaintext_bytes,
key_provider=key_provider,
# Encryption context is AAD: authenticated, logged in CloudTrail, NOT secret
encryption_context={"tenant": "acme", "purpose": "invoice"},
)
Two things that matter at principal level:
- Key commitment.
REQUIRE_ENCRYPT_REQUIRE_DECRYPT(the 2.x+ default) prevents a class of attacks where one ciphertext decrypts to different plaintexts under different keys. Do not lower it to interop with ancient clients unless you understand exactly what you are giving up. - Encryption context is your cheapest authorization and audit tool. It is additional authenticated data (AAD): not encrypted, but bound to the ciphertext and required, byte-for-byte, at decrypt time. Put it in CloudTrail, and constrain it in policy with
kms:EncryptionContext:condition keys.
Caching: trading blast radius for throughput
Calling GenerateDataKey per object is correct but expensive — every write becomes a KMS request against your quota. The data key caching layer in the Encryption SDK lets you reuse a data key across many messages, bounded by max_age, max_messages_encrypted, and max_bytes_encrypted:
from aws_encryption_sdk.caches.local import LocalCryptoMaterialsCache
from aws_encryption_sdk.materials_managers.caching import CachingCryptoMaterialsManager
cache = LocalCryptoMaterialsCache(capacity=1000)
cmm = CachingCryptoMaterialsManager(
master_key_provider=key_provider,
cache=cache,
max_age=300.0, # rotate the cached data key every 5 minutes
max_messages_encrypted=1000 # ...or after 1000 messages, whichever first
)
The tradeoff is explicit: a larger cache and longer max_age mean fewer KMS calls (cheaper, faster, quota-friendly) but a wider blast radius per data key and weaker cryptographic isolation between messages. Tune it; do not leave it unbounded. Caching is also the standard answer to the request-quota problem in section 8.
3. Multi-Region keys: portability for DR
A normal KMS key is Region-locked: ciphertext produced in eu-west-1 can only be decrypted by calling KMS in eu-west-1. If that Region is down, your data is unreadable even though the bytes are safely replicated elsewhere. Multi-Region keys (MRKs) fix this. A primary and its replicas share the same key material and a related key ID (mrk-...), so ciphertext encrypted under the primary decrypts under any replica — no re-encryption.
# Create the primary as multi-region
aws kms create-key --multi-region \
--description "global-table encryption primary" \
--region eu-west-1
# Replicate it into a DR Region (same material, independent policy)
aws kms replicate-key \
--key-id mrk-1234567890abcdef0 \
--replica-region us-east-1 \
--description "global-table encryption replica"
The nuances that catch people:
- Replicas are independently managed. Same key material, but each has its own key policy, grants, tags, and rotation state. You can grant the DR-Region team narrower access than the primary.
- They are not free-floating duplicates. Deleting the primary requires that you first promote or delete replicas; KMS protects you from orphaning material.
- MRKs weaken isolation by design. Identical material in two Regions is a deliberate availability/security tradeoff. Use them where ciphertext must cross Regions (DynamoDB global tables, cross-Region DR of encrypted snapshots, active-active apps). Do not reach for them by default — a Region-scoped key is the stronger posture when you do not need portability.
- Rotation stays in sync. When automatic rotation is enabled on the primary, rotated material propagates to replicas, so old ciphertext stays decryptable everywhere.
4. Key policies vs IAM vs grants: the authorization model
This is where KMS differs sharply from most AWS services and where the dangerous mistakes live. Three layers decide whether a Decrypt succeeds:
- The key policy — the resource policy on the key. It is the root of trust. Unlike S3, where IAM alone can grant access, a KMS key policy that does not delegate to IAM means IAM policies are ignored. The key policy is authoritative.
- IAM policies — only effective if the key policy enables IAM (the canonical
kms:*to the account root statement). With that statement present, IAM grants behave normally. - Grants — programmatic, temporary, fine-grained delegations, ideal for AWS services and short-lived workloads.
The minimum sane key policy delegates administration and usage to IAM rather than hard-coding every principal:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "EnableIAMRoot",
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::111122223333:root" },
"Action": "kms:*",
"Resource": "*"
},
{
"Sid": "KeyUsers",
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::111122223333:role/payments-app" },
"Action": ["kms:Encrypt", "kms:Decrypt", "kms:GenerateDataKey*", "kms:DescribeKey"],
"Resource": "*",
"Condition": {
"StringEquals": { "kms:EncryptionContext:tenant": "acme" }
}
}
]
}
The
EnableIAMRootstatement is not a backdoor to root credentials — it delegates the decision to IAM in this account. Omit it and you must enumerate every principal in the key policy forever, including the admins who could otherwise fix a lockout. That is the classic way teams brick a key.
When to reach for grants
Grants shine where a static policy is wrong. They are issued via API, carry their own constraints, and can be retired the instant the work is done:
aws kms create-grant \
--key-id <key-id> \
--grantee-principal arn:aws:iam::111122223333:role/batch-worker \
--operations Decrypt GenerateDataKey \
--constraints EncryptionContextSubset={tenant=acme} \
--retiring-principal arn:aws:iam::111122223333:role/grant-manager
This is exactly the mechanism AWS services use on your behalf: when you attach a CMK to an Auto Scaling group or an encrypted EBS volume, the service creates a grant so it can mint data keys without you widening the key policy. Prefer grants over key-policy edits for service integrations and transient access — they are revocable and don’t bloat the resource policy.
5. Cross-account encryption: S3, EBS, and snapshots
Sharing encrypted data across accounts is a two-sided handshake: the key policy in the owning account must allow the foreign principal, and that principal’s IAM policy in their own account must allow the KMS actions. Missing either side fails closed.
Key policy in the owning account (111122223333):
{
"Sid": "AllowConsumerAccount",
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::444455556666:root" },
"Action": ["kms:Decrypt", "kms:DescribeKey", "kms:GenerateDataKey"],
"Resource": "*"
}
IAM policy in the consumer account (444455556666) — note you must name the full key ARN, because the key lives in another account:
{
"Effect": "Allow",
"Action": ["kms:Decrypt", "kms:DescribeKey"],
"Resource": "arn:aws:kms:eu-west-1:111122223333:key/<key-id>"
}
Service-specific edges that matter:
- S3 with SSE-KMS: for cross-account reads, the bucket policy, the object’s KMS key policy, and the reader’s IAM all must align. Set the bucket’s default encryption to your CMK and enable S3 Bucket Keys — it caches a bucket-level data key and collapses thousands of per-object KMS calls into a handful, cutting both cost and quota pressure dramatically.
- EBS / snapshots: you cannot share a snapshot encrypted with an AWS-managed key cross-account — full stop. Use a CMK, share the snapshot, grant the target account
Decrypt+CreateGranton the key, and the target re-encrypts to their key on copy. This is why production fleets standardize on CMKs for EBS even when defaults would “work.”
6. Key rotation: automatic vs manual
Automatic rotation is the default answer. Enable it and KMS generates new cryptographic material on a schedule (the default is yearly; the rotation period is now configurable down to ~90 days), retaining all prior material so old ciphertext stays decryptable. The key ID, ARN, and policy never change — rotation is invisible to applications.
aws kms enable-key-rotation \
--key-id <key-id> \
--rotation-period-in-days 180
aws kms get-key-rotation-status --key-id <key-id>
Crucial subtlety: automatic rotation rotates the KMS key material, but it does not re-encrypt your existing data or your stored data keys. Old envelopes are unwrapped with retained old material; new writes use new material. If your compliance regime demands that data actually be re-wrapped under fresh material, that is a separate, application-driven re-encryption job, not something rotation does for you:
# Conceptual re-encrypt: ReEncrypt swaps the wrapping key without exposing plaintext.
# Plaintext never returns to your process — KMS decrypts and re-wraps internally.
new_blob = kms.re_encrypt(
CiphertextBlob=old_wrapped_blob,
SourceKeyId="alias/payments-prod-old",
DestinationKeyId="alias/payments-prod-new",
DestinationEncryptionContext={"tenant": "acme"},
)["CiphertextBlob"]
Manual rotation (a brand-new key behind the same alias) is for cases automatic rotation can’t cover: changing key spec, moving to a new region/account, or responding to suspected compromise where you must invalidate old material. You repoint the alias and run a re-encryption backfill. Asymmetric and HMAC keys historically required this; symmetric keys rarely do.
7. Auditing and controls
Every KMS API call lands in CloudTrail — including Decrypt, with the encryptionContext, the calling principal, and the source IP. This is the highest-signal security telemetry in the account; treat unexpected Decrypt calls as you would unexpected AssumeRole.
Tighten policies with condition keys so a key is usable only in the intended context:
{
"Sid": "OnlyViaS3FromOrg",
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::111122223333:role/payments-app" },
"Action": ["kms:Decrypt", "kms:GenerateDataKey"],
"Resource": "*",
"Condition": {
"StringEquals": {
"kms:ViaService": "s3.eu-west-1.amazonaws.com",
"aws:PrincipalOrgID": "o-exampleorgid"
}
}
}
kms:ViaService pins usage to a specific AWS service (the key can only be used through S3, not by a human running aws kms decrypt). aws:PrincipalOrgID confines cross-account use to your organization. For ABAC, gate Decrypt on tag parity between the principal and the key with aws:ResourceTag / aws:PrincipalTag so access scales with tags instead of hand-written ARNs:
"Condition": {
"StringEquals": {
"aws:ResourceTag/project": "${aws:PrincipalTag/project}"
}
}
8. Cost and request-quota management
KMS pricing has two parts: roughly $1 per CMK per month (replicas billed separately), and per-request charges with a shared, Region-level request rate quota for cryptographic operations. Symmetric Decrypt/GenerateDataKey/Encrypt share a quota (a few tens of thousands of requests/second depending on Region and key type). A hot path that calls KMS per object will hit ThrottlingException long before you expect.
The levers, in order of impact:
- S3 Bucket Keys — one toggle, eliminates the vast majority of per-object KMS calls. Use it everywhere you use SSE-KMS.
- Data key caching (section 2) — reuse data keys within a bounded window for high-volume application encryption.
- Quota increases — the request-rate quota is adjustable via Service Quotas; request headroom before a launch, not during an incident.
- Back off correctly — handle
ThrottlingExceptionwith exponential backoff and jitter; the SDKs do this, but verify your retry config rather than assuming it.
The cost trap is never the $1/month per key. It is millions of unbatched
Decryptcalls from a service that should have been using Bucket Keys or a data-key cache. Architect the call volume down first; optimize the bill second.
Enterprise scenario
A payments platform ran active-active in eu-west-1 and eu-central-1 behind a DynamoDB global table, with the application performing client-side field-level encryption on PAN data before writing. They had used a standard, Region-scoped CMK in eu-west-1. The global table dutifully replicated the ciphertext to eu-central-1 — but when they ran a regional failover game-day, the eu-central-1 app could not decrypt a single replicated record. The ciphertext was a KMS envelope bound to a key that only existed in eu-west-1, and kms:Decrypt in eu-central-1 had no key to call. Replicated-but-unreadable data is the worst kind of DR failure: it looks healthy until you cut over.
The fix was a multi-Region key, and critically, a backfill — rotating to an MRK does nothing for the millions of records already wrapped under the old single-Region key. They created an MRK primary, replicated it, repointed the application, and ran a controlled ReEncrypt migration over the existing items so every envelope was rewrapped under the portable key. They also tightened each replica’s policy independently: the eu-central-1 replica granted the app Decrypt only, while writes (and thus GenerateDataKey) stayed pinned to the active primary, so a failover couldn’t silently start minting keys in the standby Region.
# Primary in the active Region
aws kms create-key --multi-region --region eu-west-1 \
--description "pan-field-encryption primary"
# Replica in the standby Region (independent, decrypt-only policy attached after)
aws kms replicate-key \
--key-id mrk-0a1b2c3d4e5f6a7b8 \
--replica-region eu-central-1
The lesson the team wrote into their standards: if a ciphertext can travel between Regions, the key that protects it must travel too — and you must re-encrypt the data that predates that decision. Replication moves bytes; it does not move the ability to read them.
Verify
Confirm the pieces before you trust them in production.
# 1. Rotation is actually on, and on the cadence you set
aws kms get-key-rotation-status --key-id alias/payments-prod
# -> "KeyRotationEnabled": true, expected RotationPeriodInDays
# 2. The key is multi-region with the expected replicas
aws kms describe-key --key-id alias/payments-prod \
--query 'KeyMetadata.{MR:MultiRegion, Cfg:MultiRegionConfiguration}'
# 3. Round-trip a data key end to end
BLOB=$(aws kms generate-data-key --key-id alias/payments-prod \
--key-spec AES_256 --query CiphertextBlob --output text)
aws kms decrypt --ciphertext-blob fileb://<(echo "$BLOB" | base64 --decode) \
--query KeyId --output text
# -> returns the key ARN, proving decrypt authorization works
# 4. Encryption-context enforcement: wrong context must FAIL
aws kms decrypt --ciphertext-blob fileb://./wrapped.bin \
--encryption-context tenant=wrong # expect AccessDenied / InvalidCiphertext
# 5. Cross-Region portability (the DR claim): decrypt eu-west-1 ciphertext in us-east-1
aws kms decrypt --region us-east-1 \
--ciphertext-blob fileb://./eu-west-1-wrapped.bin \
--query KeyId --output text
Then read the audit trail: every one of those Decrypt calls should appear in CloudTrail with the principal, encryption context, and kms:ViaService (or lack of it) you expect. If a call you didn’t make shows up, you have an authorization gap.