A docker run on a laptop and an ECS service on Fargate share almost no operational concerns. Fargate removes the EC2 layer, but it does not remove the decisions that decide whether a deploy is safe at 3am: how each task gets an IP and a security group, when a deployment rolls back on its own, and what happens to in-flight requests when a task is told to stop. This guide walks the pieces I actually wire up for a production Fargate service, with the task definitions, scaling policies, and lifecycle settings that make the difference between a service that drains cleanly and one that drops connections on every release.
Assume the AWS provider/region is set, an ALB exists, and you are on a recent CLI (aws --version >= 2.x). I use Linux/X86_64 platform version LATEST throughout, which today resolves to platform version 1.4.0.
1. Task definition: sizing and platform version
A Fargate task definition declares the container(s), the CPU/memory envelope, the network mode, and two distinct IAM roles. The CPU/memory pair is not free-form: Fargate only accepts specific combinations, and memory is constrained by the CPU value you pick.
cpu (vCPU) |
Valid memory range |
|---|---|
| 256 (.25) | 512, 1024, 2048 MiB |
| 512 (.5) | 1024 - 4096 MiB (1 GiB steps) |
| 1024 (1) | 2048 - 8192 MiB (1 GiB steps) |
| 2048 (2) | 4096 - 16384 MiB (1 GiB steps) |
| 4096 (4) | 8192 - 30720 MiB (1 GiB steps) |
The whole task shares this budget. If you run a sidecar (log router, proxy), its requests come out of the same pool, so size the task for the sum, then optionally cap individual containers with container-level cpu/memory.
{
"family": "checkout-api",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"runtimePlatform": {
"cpuArchitecture": "ARM64",
"operatingSystemFamily": "LINUX"
},
"executionRoleArn": "arn:aws:iam::111122223333:role/checkout-execution",
"taskRoleArn": "arn:aws:iam::111122223333:role/checkout-task",
"containerDefinitions": [
{
"name": "app",
"image": "111122223333.dkr.ecr.us-east-1.amazonaws.com/checkout:1.42.0",
"essential": true,
"portMappings": [{ "containerPort": 8080, "protocol": "tcp" }],
"stopTimeout": 30,
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8080/healthz || exit 1"],
"interval": 15,
"timeout": 5,
"retries": 3,
"startPeriod": 30
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/checkout-api",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "app",
"mode": "non-blocking",
"max-buffer-size": "25m"
}
}
}
]
}
Two choices worth calling out. ARM64 (Graviton) Fargate is typically ~20% cheaper per vCPU-hour than X86_64 for the same size and usually performs as well or better for typical web workloads; the only cost is that your image must be multi-arch or arm64. And pin the image to an immutable tag or digest, never :latest – ECS resolves the tag at task launch, so a moving tag means two tasks in the same deployment can run different code.
Register it:
aws ecs register-task-definition --cli-input-json file://checkout-api.task.json
2. awsvpc networking: one ENI and security group per task
On Fargate the network mode is always awsvpc. Each task gets its own elastic network interface (ENI) with a private IP from the subnet you place it in, and its own security group(s). This is the single most important networking fact about Fargate: a task is a first-class network citizen, not a process sharing the host’s IP. You get per-task security groups, per-task VPC Flow Logs, and clean blast-radius isolation – at the cost of consuming a subnet IP per running task.
That IP consumption is the planning trap. During a rolling deploy you briefly run more tasks than steady state (new tasks come up before old ones drain), and each consumes an IP. Size subnets accordingly:
Peak IPs needed ~= desired_count x (1 + maximumPercent/100 - 1) plus headroom. For a 40-task service at
maximumPercent: 200, plan for up to ~80 task IPs across your subnets during a deploy, on top of anything else in those subnets.
Spread tasks across at least two private subnets in different AZs, and give each a /24 or larger if the service is sizeable. The service network config:
{
"awsvpcConfiguration": {
"subnets": ["subnet-0aaa1111", "subnet-0bbb2222"],
"securityGroups": ["sg-0task55555"],
"assignPublicIp": "DISABLED"
}
}
assignPublicIp must be DISABLED for tasks in private subnets – they reach ECR, Secrets Manager, and CloudWatch through a NAT gateway or, better, VPC interface endpoints (com.amazonaws.<region>.ecr.api, ecr.dkr, secretsmanager, logs, plus an S3 gateway endpoint for ECR layer pulls). Endpoints cut NAT data-processing cost and keep image pulls on the AWS network. The task security group should allow inbound only from the ALB’s security group on the container port, and the ALB SG allows that egress – reference SGs by ID, never CIDR.
3. Service Auto Scaling: target tracking vs step scaling
ECS services scale through Application Auto Scaling, registered against a scalable target. The hard part is picking the right metric. For a request-driven service behind an ALB, ALBRequestCountPerTarget is the cleanest signal: it scales on actual load per task, independent of how CPU-bound the work is, and it reacts before CPU saturates.
# Register the service as a scalable target
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--resource-id service/prod-cluster/checkout-api \
--scalable-dimension ecs:service:DesiredCount \
--min-capacity 4 \
--max-capacity 40
{
"TargetValue": 1000.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ALBRequestCountPerTarget",
"ResourceLabel": "app/checkout-alb/50dc6c495c0c9188/targetgroup/checkout-tg/6d0ecf831eec9f09"
},
"ScaleInCooldown": 300,
"ScaleOutCooldown": 60
}
aws application-autoscaling put-scaling-policy \
--service-namespace ecs \
--resource-id service/prod-cluster/checkout-api \
--scalable-dimension ecs:service:DesiredCount \
--policy-name reqcount-tt \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration file://reqcount-tt.json
The ResourceLabel is <ALB full name>/<target group full name> – the portion after loadbalancer/ and targetgroup/ in the respective ARNs. Get it wrong and the policy silently does nothing.
Target tracking is the default tool: pick a target value, AWS provisions a managed pair of CloudWatch alarms and keeps the metric near it. Use CPU/memory target tracking only when load does not map cleanly to request count (batch workers, gRPC streaming). Reach for step scaling when you need asymmetric or aggressive reactions – for example, add capacity hard when a queue depth alarm crosses a threshold:
aws application-autoscaling put-scaling-policy \
--service-namespace ecs \
--resource-id service/prod-cluster/worker \
--scalable-dimension ecs:service:DesiredCount \
--policy-name queue-step \
--policy-type StepScaling \
--step-scaling-policy-configuration '{
"AdjustmentType": "ChangeInCapacity",
"MetricAggregationType": "Maximum",
"StepAdjustments": [
{ "MetricIntervalLowerBound": 0, "MetricIntervalUpperBound": 1000, "ScalingAdjustment": 2 },
{ "MetricIntervalLowerBound": 1000, "ScalingAdjustment": 5 }
]
}'
You can attach multiple policies to one service. A common pattern: request-count target tracking for the steady state, plus a CPU target-tracking policy as a safety net so a CPU-heavy code path cannot starve before request count reacts. When policies disagree, Application Auto Scaling takes the largest desired count, so layering “scale out” policies is safe; combining aggressive scale-in policies is what gets you into flapping.
4. Deployments: rolling updates and the circuit breaker
ECS rolling deployments are governed by two knobs on the service. minimumHealthyPercent is the floor of healthy tasks ECS keeps during a deploy; maximumPercent is the ceiling it may temporarily exceed desired count to bring up replacements. For a zero-downtime rolling deploy on an even-sized service, 100/200 is the safe default: never drop below desired count, allow a full extra set while rolling.
The piece people skip is the deployment circuit breaker. Without it, a bad image that never passes health checks leaves the service stuck replacing failing tasks indefinitely, draining your IP pool and paging you. With it, ECS watches for a run of failed task launches and, if rollback is on, automatically reverts to the last known-good task definition.
aws ecs update-service \
--cluster prod-cluster \
--service checkout-api \
--task-definition checkout-api:87 \
--deployment-configuration '{
"minimumHealthyPercent": 100,
"maximumPercent": 200,
"deploymentCircuitBreaker": { "enable": true, "rollback": true }
}' \
--health-check-grace-period-seconds 60
--health-check-grace-period-seconds tells ECS to ignore ALB health check failures for the first N seconds after a task starts, so a slow-booting app is not killed before it is ready. Set it slightly above your real cold-start time. The circuit breaker counts failures relative to desired count (it scales the threshold with service size, with a floor), so it behaves sensibly for both a 3-task and a 300-task service.
For higher-stakes changes, ECS now also supports blue/green deployments natively (in addition to the classic rolling and the older CodeDeploy-driven blue/green). Rolling with a circuit breaker is the right default for most services; reach for blue/green when you need a full parallel environment and instant cutover/rollback.
5. Graceful shutdown: SIGTERM, stopTimeout, and draining
When ECS stops a task – a deploy, a scale-in, or a Spot interruption – it sends SIGTERM to each container’s entrypoint process (PID 1), waits up to stopTimeout (default 30s on Fargate, max 120s), then sends SIGKILL. Two failure modes hide here.
First: PID 1 must actually receive and handle SIGTERM. If your container starts the app via a shell (sh -c "node server.js"), the shell is PID 1 and may not forward the signal – your app gets SIGKILLed with in-flight requests. Either run the app as PID 1 directly (exec form CMD, or ENTRYPOINT ["node", "server.js"]) or set "initProcessEnabled": true in linuxParameters to get a tini-style init that reaps and forwards signals.
Second: drain before you exit. On SIGTERM the app should stop accepting new work, finish in-flight requests, then exit:
const server = app.listen(8080);
process.on('SIGTERM', () => {
console.log('SIGTERM received, draining');
server.close(() => { // stop accepting, finish in-flight
console.log('drained, exiting');
process.exit(0);
});
// safety net well under stopTimeout
setTimeout(() => process.exit(1), 25_000).unref();
});
Coordinate three timers so they nest correctly: the ALB target group deregistration delay (deregistration_delay.timeout_seconds, default 300s, drop to ~30s for fast services), your app’s in-flight grace period, and the task stopTimeout. ECS deregisters the task from the target group on stop; the ALB stops sending new connections and waits the deregistration delay for existing ones. Your stopTimeout should be >= the time the app needs to drain, and the deregistration delay should be >= that too, or the ALB cuts connections the app is still serving.
6. Secrets, config, and least-privilege roles
Fargate tasks have two roles and conflating them is the most common IAM mistake on ECS:
- Execution role – assumed by the ECS agent before the container starts. It pulls the image from ECR, writes to the log group, and resolves any
secrets/Secrets Manager references injected as environment variables. - Task role – assumed by your application code at runtime to call AWS APIs (S3, DynamoDB, SQS). This is the credential your app sees.
Keep them separate and minimal. Inject secrets via the secrets block so plaintext never lands in the task definition or in describe-tasks output:
"secrets": [
{ "name": "DB_PASSWORD", "valueFrom": "arn:aws:secretsmanager:us-east-1:111122223333:secret:prod/checkout/db-AbCdEf" }
],
"environment": [
{ "name": "LOG_LEVEL", "value": "info" }
]
The execution role needs secretsmanager:GetSecretValue (and kms:Decrypt if the secret uses a CMK) on exactly those secret ARNs – not *:
{
"Effect": "Allow",
"Action": "secretsmanager:GetSecretValue",
"Resource": "arn:aws:secretsmanager:us-east-1:111122223333:secret:prod/checkout/*"
}
The task role carries only the runtime permissions your code uses. If your app writes to one bucket, scope it to that bucket’s ARN and the s3:PutObject action. Static environment entries are visible in plaintext via the API – never put credentials there; that is what secrets is for.
7. Observability: Container Insights, structured logs, tracing
Turn on Container Insights at the cluster level for per-task/service CPU, memory, and network metrics plus the curated dashboards. Enable the enhanced observability tier for container-level granularity:
aws ecs update-cluster-settings \
--cluster prod-cluster \
--settings name=containerInsights,value=enhanced
For logs, the awslogs driver (Section 1) is the simplest path; set mode=non-blocking with a bounded max-buffer-size so a slow log backend cannot block your application threads. When you need routing – duplicate to S3 and a SIEM, parse, or sample – use FireLens with a Fluent Bit sidecar:
{
"name": "log_router",
"image": "public.ecr.aws/aws-observability/aws-for-fluent-bit:stable",
"essential": true,
"firelensConfiguration": { "type": "fluentbit" },
"memoryReservation": 50
}
Then the app container’s logConfiguration uses "logDriver": "awsfirelens" with output options (e.g. to CloudWatch and an S3 backup). Emit logs as JSON from the app so they are queryable in CloudWatch Logs Insights:
fields @timestamp, level, msg, latency_ms
| filter level = "error"
| sort @timestamp desc
| limit 50
For distributed tracing, add the AWS Distro for OpenTelemetry collector as a sidecar and grant the task role AWSXRayDaemonWriteAccess; instrument the app with OTel and export to X-Ray (or your APM) for end-to-end spans across services.
8. Cost levers: Fargate Spot, capacity providers, right-sizing
Three levers move the bill, in order of impact.
Capacity providers + Fargate Spot. Fargate Spot runs the same tasks at a steep discount but can reclaim them with a ~2-minute SIGTERM warning. Run a mixed strategy via a capacity provider strategy: a base of on-demand FARGATE for a guaranteed floor, then FARGATE_SPOT for the elastic, interruption-tolerant remainder.
aws ecs put-cluster-capacity-providers \
--cluster prod-cluster \
--capacity-providers FARGATE FARGATE_SPOT \
--default-capacity-provider-strategy \
capacityProvider=FARGATE,base=2,weight=1 \
capacityProvider=FARGATE_SPOT,weight=4
This keeps 2 tasks always on-demand, then splits additional tasks 1:4 on-demand:Spot. Only do this for stateless services that handle SIGTERM cleanly (Section 5) – Spot reclamation uses the same graceful-stop path, so a service that drains correctly tolerates it.
Graviton (ARM64). Already covered in Section 1 – the cheapest change you can make for compatible images.
Right-sizing. Use Container Insights and Compute Optimizer’s ECS recommendations to find tasks provisioned at 4 vCPU that peak at 0.8. Fargate bills per vCPU-second and GB-second from pull to stop, so an oversized task definition costs you on every running replica, every hour. Resize the task definition, redeploy, and re-measure.
Verify
Confirm the service is healthy and behaving before you walk away.
# Rollout reached steady state (rolloutState should be COMPLETED, no failed tasks)
aws ecs describe-services --cluster prod-cluster --services checkout-api \
--query 'services[0].deployments[].{status:status,rollout:rolloutState,desired:desiredCount,running:runningCount,failed:failedTasks}'
# Each running task has its own ENI + private IP (awsvpc proof)
aws ecs list-tasks --cluster prod-cluster --service-name checkout-api --query 'taskArns' --output text \
| xargs aws ecs describe-tasks --cluster prod-cluster --tasks \
--query 'tasks[].attachments[].details[?name==`privateIPv4Address`].value' --output text
# Scaling policies are attached and not in a failed alarm state
aws application-autoscaling describe-scaling-policies \
--service-namespace ecs --resource-id service/prod-cluster/checkout-api \
--query 'ScalingPolicies[].{name:PolicyName,type:PolicyType}'
# ALB target group: all targets healthy
aws elbv2 describe-target-health --target-group-arn <checkout-tg-arn> \
--query 'TargetHealthDescriptions[].TargetHealth.State'
Then force a real deploy of a deliberately broken task definition in a non-prod copy and confirm the circuit breaker flips the rollout to ROLLBACK_IN_PROGRESS and restores the prior task definition – a circuit breaker you have never seen fire is a configuration you do not actually have.
Enterprise scenario
A fintech platform team ran a payment-authorization service on Fargate behind an ALB, scaled on CPU target tracking. It worked until a Friday release: under a traffic spike, p99 latency tripled and the team saw a steady trickle of 502s on every deploy and every scale-in event, even though CPU never crossed the 70% target.
Two root causes. First, the service used CPU target tracking, but the workload was I/O-bound on a downstream HSM – CPU stayed low while request queues grew, so scaling reacted late. Second, and worse, the app was launched via sh -c "java -jar app.jar": the shell was PID 1, swallowed SIGTERM, and the JVM was SIGKILLed on every task stop, severing in-flight authorizations. The ALB deregistration delay was still the default 300s, so during deploys the ALB also kept routing to tasks ECS had already begun stopping.
The fix was three coordinated changes, no new infrastructure. They switched the primary scaling signal to ALBRequestCountPerTarget (keeping a CPU policy as a backstop), changed the container entrypoint to exec the JVM as PID 1 with a real SIGTERM handler that drained the in-flight queue, and aligned the timers: deregistration delay to 30s, stopTimeout to 60s, drain grace to ~45s.
"deploymentConfiguration": {
"minimumHealthyPercent": 100,
"maximumPercent": 200,
"deploymentCircuitBreaker": { "enable": true, "rollback": true }
}
# ALB target group: drain fast, well inside stopTimeout
resource "aws_lb_target_group" "auth" {
name = "auth-tg"
port = 8080
protocol = "HTTP"
target_type = "ip" # required for awsvpc/Fargate tasks
vpc_id = var.vpc_id
deregistration_delay = 30
health_check {
path = "/healthz"
healthy_threshold = 2
unhealthy_threshold = 3
interval = 15
timeout = 5
matcher = "200"
}
}
Note target_type = "ip" – Fargate tasks register by IP, not instance, because each task is its own ENI. After the change, deploy-time 502s went to zero and the service scaled out ahead of the latency curve instead of behind it. The lesson the team took away: on Fargate, “graceful shutdown” is not one setting – it is PID 1, stopTimeout, and the target group deregistration delay agreeing with each other.