Service mesh promised uniform connectivity, mTLS, and traffic policy across every workload. It also delivered Envoy on every pod, a control plane to operate, certificate rotation to babysit, and a sidecar tax on latency and memory. VPC Lattice is AWS’s answer to the same problem at a different layer: it pushes Layer 7 routing and IAM-based authorization into the VPC data path itself, so a Lambda function, an EKS pod, and an EC2 instance in three different accounts can call each other by a stable DNS name with no proxy in the request path that you operate. This is a build guide for wiring that together correctly, and for knowing when Lattice is the wrong tool.
The Lattice model: services, service networks, listeners, and target groups
Four resources carry the whole design. Get the nouns right and the rest follows.
| Resource | What it is | Analogy |
|---|---|---|
| Service | A logical application you expose (payments, orders). Owns a domain name, listeners, and routing rules. |
An ALB + its DNS name |
| Target group | The compute behind a service: instances, IPs, a Lambda, or an ALB. Health-checked. | An ALB target group |
| Listener | A protocol/port on the service (HTTP/HTTPS/gRPC) with rules that route to target groups. | An ALB listener |
| Service network | The logical boundary that joins services to the VPCs allowed to call them, and carries the auth policy. | The mesh itself |
The mental model: a service is a callable application. A service network is the trust-and-reachability domain. You associate services into a service network (making them callable), and you associate VPCs into the same service network (giving clients in those VPCs the ability to call). A client only reaches a service if both the client’s VPC and the target service share a service network. That double association is the security boundary, and it is the first thing to reason about before any IAM.
Crucially, there is no sidecar. When a VPC is associated, Lattice programs the VPC’s data path so that traffic to a Lattice-managed link-local address range (169.254.171.0/24) and the service’s managed DNS name is intercepted and routed by the AWS-managed Lattice data plane. Your application makes a plain HTTP call; nothing runs in your pod or on your host.
Step 1 — Create a service network and a service
Create the network first; it is the anchor everything binds to.
# The trust boundary. AWS_IAM auth means every request must be SigV4-signed.
SN_ARN=$(aws vpc-lattice create-service-network \
--name platform-mesh \
--auth-type AWS_IAM \
--query 'arn' --output text)
# A service = one callable application.
SVC_ARN=$(aws vpc-lattice create-service \
--name orders \
--auth-type AWS_IAM \
--query 'arn' --output text)
auth-type exists on both the service network and the service. They are evaluated independently. NONE disables auth at that level; AWS_IAM enforces SigV4 and applies the auth policy. A common production posture is AWS_IAM on the network (broad guardrail) and AWS_IAM on each service (per-service rules), so a request must satisfy both.
Step 2 — Define a target group and register targets
Lattice target groups come in distinct types. For EC2/EKS you almost always want IP (so pods register directly); INSTANCE is for classic EC2 fleets; LAMBDA and ALB exist for those targets.
TG_ARN=$(aws vpc-lattice create-target-group \
--name orders-ip \
--type IP \
--config '{
"port": 8080,
"protocol": "HTTP",
"vpcIdentifier": "vpc-0aa11bb22cc33dd44",
"ipAddressType": "IPV4",
"healthCheck": {
"enabled": true,
"protocol": "HTTP",
"path": "/healthz",
"healthyThresholdCount": 3,
"unhealthyThresholdCount": 2
}
}' \
--query 'arn' --output text)
aws vpc-lattice register-targets \
--target-group-identifier "$TG_ARN" \
--targets id=10.0.12.31,port=8080 id=10.0.12.78,port=8080
Lattice target groups are not EC2/ELB target groups and live in a different API namespace. Do not try to reuse an existing
aws elbv2target group ARN here — they are incompatible resources.
Step 3 — Add a listener with routing rules
The listener binds a port to rules. Rules match on path, method, and headers, and forward to one or more weighted target groups. This is where blue-green and canary shifts live.
LISTENER_ARN=$(aws vpc-lattice create-listener \
--service-identifier "$SVC_ARN" \
--name http \
--protocol HTTP --port 80 \
--default-action '{
"forward": { "targetGroups": [ { "targetGroupIdentifier": "'"$TG_ARN"'", "weight": 100 } ] }
}' \
--query 'arn' --output text)
Path/header routing and weighted blue-green
Rules carry a numeric priority (lower wins) and a match. To split traffic for a canary, list both target groups in one rule’s forward with weights — Lattice splits in proportion to the weight sum.
# Header-based route: send internal callers to the v2 target group only.
aws vpc-lattice create-rule \
--service-identifier "$SVC_ARN" \
--listener-identifier "$LISTENER_ARN" \
--name canary-by-header \
--priority 10 \
--match '{
"httpMatch": {
"headerMatches": [
{ "name": "x-release-channel", "match": { "exact": "canary" } }
]
}
}' \
--action '{
"forward": { "targetGroups": [ { "targetGroupIdentifier": "'"$TG_V2_ARN"'", "weight": 100 } ] }
}'
# Weighted 90/10 shift on the default path for everyone else.
aws vpc-lattice update-rule \
--service-identifier "$SVC_ARN" \
--listener-identifier "$LISTENER_ARN" \
--rule-identifier default \
--action '{
"forward": { "targetGroups": [
{ "targetGroupIdentifier": "'"$TG_ARN"'", "weight": 90 },
{ "targetGroupIdentifier": "'"$TG_V2_ARN"'", "weight": 10 }
] }
}'
A blue-green cutover is then just moving the weights to 0/100, observing, and deregistering the old target group. No DNS change, no client reconfiguration — the service name is stable across the shift.
Step 4 — Associate the service and the VPCs
Two associations make traffic flow. The service into the network (so it is callable), and each client VPC into the network (so clients can resolve and reach it).
# Make the service callable inside the network.
aws vpc-lattice create-service-network-service-association \
--service-network-identifier "$SN_ARN" \
--service-identifier "$SVC_ARN"
# Let a client VPC reach everything in the network.
aws vpc-lattice create-service-network-vpc-association \
--service-network-identifier "$SN_ARN" \
--vpc-identifier vpc-0client1111aaaa22 \
--security-group-ids sg-0latticeclients0001
The
--security-group-idson the VPC association is the egress gate for Lattice traffic leaving that VPC. This is the single most-missed control: it is not the service’s security group and not the pod’s SG. If clients get connection timeouts, check this SG before anything else.
A VPC can associate with multiple service networks, but a service can belong to only one service network at a time. Design your network boundaries around blast radius and shared-ownership, not around per-team convenience.
Step 5 — Share the service network across accounts with AWS RAM
Cross-account is the whole point. You share the service network (not individual services) with AWS Resource Access Manager, then each consuming account associates its own VPCs.
# In the network-owner account: share the service network with an OU or accounts.
aws ram create-resource-share \
--name lattice-platform-mesh \
--resource-arns "$SN_ARN" \
--principals arn:aws:organizations::111122223333:ou/o-abc123/ou-root-xxxxxxxx \
--permission-arns arn:aws:ram::aws:permission/AWSRAMPermissionVpcLatticeServiceNetworkVpcAssociation
Sharing within an AWS Organization with trusted access enabled means consumers see the share immediately without an explicit accept. In the consumer account, the team then runs create-service-network-vpc-association against the shared ARN — they control which of their VPCs join, and they attach their own client security group. Service owners and network owners can be different accounts entirely; a producer account associates its service into the shared network from its side.
Step 6 — Auth policies: IAM-based service-to-service authorization
This is where Lattice replaces mesh mTLS-plus-SPIFFE with plain IAM. When auth-type is AWS_IAM, every request must be SigV4-signed with the caller’s IAM credentials, and Lattice evaluates an auth policy — a resource policy attached to the service (and/or the service network) — against the signed principal. No certificates, no SPIFFE IDs; the identity is the IAM role.
Attach a policy that allows only specific caller roles, and can constrain by HTTP method and path via the vpc-lattice-svcs condition keys.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowCheckoutToReadOrders",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::444455556666:role/checkout-service"
},
"Action": "vpc-lattice-svcs:Invoke",
"Resource": "*",
"Condition": {
"StringEquals": { "vpc-lattice-svcs:RequestMethod": "GET" },
"ArnLike": { "aws:PrincipalArn": "arn:aws:iam::444455556666:role/checkout-service" }
}
},
{
"Sid": "DenyAnonymous",
"Effect": "Deny",
"Principal": "*",
"Action": "vpc-lattice-svcs:Invoke",
"Resource": "*",
"Condition": {
"BoolIfExists": { "aws:PrincipalIsAWSService": "false" },
"Null": { "aws:PrincipalArn": "true" }
}
}
]
}
aws vpc-lattice put-auth-policy \
--resource-identifier "$SVC_ARN" \
--policy file://orders-auth-policy.json
The condition keys worth knowing: vpc-lattice-svcs:RequestMethod, vpc-lattice-svcs:RequestPath (or RequestQueryString), vpc-lattice-svcs:SourceVpc, and vpc-lattice-svcs:ServiceNetworkArn. The principal keys are standard IAM (aws:PrincipalArn, aws:PrincipalOrgID). A useful pattern is to gate by org at the network level and by exact role at the service level.
Making the caller sign
The caller must send SigV4 for service vpc-lattice-svcs. From an SDK, use the standard signing path; the simplest correct example is Python with the AWS-maintained request signer:
import boto3, requests
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest
session = boto3.Session()
creds = session.get_credentials().get_frozen_credentials()
region = "eu-west-1"
url = "https://orders-0123456789.7d67968.vpc-lattice-svcs.eu-west-1.on.aws/v1/orders/42"
req = AWSRequest(method="GET", url=url)
# Service name is "vpc-lattice-svcs", not "vpc-lattice".
SigV4Auth(creds, "vpc-lattice-svcs", region).add_auth(req)
resp = requests.get(url, headers=dict(req.headers))
print(resp.status_code, resp.text)
On EKS, the cleanest way to get those credentials into the pod is EKS Pod Identity (or IRSA): the pod assumes an IAM role, and that role’s ARN is exactly the principal your auth policy allows. The identity in the auth policy and the identity the workload runs as become the same object — that is the property that makes this simpler than mesh PKI.
Step 7 — Integrating EKS and Lambda targets
EKS. Run the AWS Gateway API Controller. You define standard Kubernetes Gateway API objects, and the controller reconciles them into Lattice services, listeners, target groups, and rules, registering pod IPs automatically.
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: orders
annotations:
application-networking.k8s.aws/lattice-assigned-domain-name: "true"
spec:
parentRefs:
- name: platform-mesh # a Gateway mapped to the service network
sectionName: http
rules:
- backendRefs:
- name: orders-svc # a Kubernetes Service
kind: Service
port: 8080
weight: 100
The controller maps the Gateway to a service network and each HTTPRoute to a Lattice service, so application teams stay in Kubernetes-native YAML while platform gets Lattice’s cross-account reach. Pod churn re-registers targets without manual register-targets calls.
Lambda. Register the function as a LAMBDA target group and forward to it. Lattice invokes the function over its managed integration; no function URL, no API Gateway in front.
aws vpc-lattice create-target-group --name notify-fn --type LAMBDA
aws vpc-lattice register-targets \
--target-group-identifier "$FN_TG_ARN" \
--targets id=arn:aws:lambda:eu-west-1:444455556666:function:notify
The same auth policy model applies: a caller’s IAM role must be allowed vpc-lattice-svcs:Invoke on the service fronting the Lambda. You have unified authorization across EKS, EC2, and Lambda with one policy language.
Lattice vs App Mesh vs PrivateLink: choosing the right primitive
These are not interchangeable. Pick by the boundary you actually have.
| Concern | VPC Lattice | App Mesh (Envoy) | PrivateLink |
|---|---|---|---|
| Data-path proxy you operate | None (AWS-managed) | Envoy sidecar per workload | None (ENI) |
| Layer | L7 routing + IAM authz | L7, full Envoy feature set | L4 (TCP), single service |
| Cross-account / cross-VPC | First-class via RAM | Possible, heavy to wire | First-class, 1 service per endpoint |
| AuthZ model | IAM auth policies + SigV4 | mTLS / your own | Endpoint policies, no app identity |
| Best when | Many services across accounts need policy-driven L7 without sidecars | You need deep Envoy control and portability beyond AWS | You expose one service across a trust boundary, no IP routing |
AWS App Mesh has been deprecated — new designs that would have reached for App Mesh should evaluate Lattice or an open-source mesh (Istio, Cilium) instead. Use PrivateLink when you are publishing a single endpoint to a consumer and want zero network-layer reachability; use Lattice when you have a fleet of services that must talk under IAM policy across accounts; reach for an open-source mesh only when you need Envoy-grade traffic policy or multi-cloud portability that Lattice cannot give you.
A subtlety that matters at scale: Lattice operates at the application layer, so it sidesteps CIDR overlap between client and target VPCs entirely — the service is reached by name and link-local address, not by routing the target’s real IP. That alone is a reason to prefer it over Transit Gateway peering for service-to-service calls in an estate where renumbering is impossible.
Verify
Confirm reachability, identity enforcement, and routing before declaring victory.
# 1. Targets are actually healthy (a Healthy count of 0 means no traffic flows).
aws vpc-lattice list-targets --target-group-identifier "$TG_ARN" \
--query 'items[].{ip:id,status:status,reason:reasonCode}' --output table
# 2. The service has a managed DNS name and the associations exist.
aws vpc-lattice get-service --service-identifier "$SVC_ARN" \
--query '{dns:dnsEntry.domainName,auth:authType,status:status}'
aws vpc-lattice list-service-network-vpc-associations \
--service-network-identifier "$SN_ARN" --query 'items[].{vpc:vpcId,status:status}'
# 3. From an allowed pod/host: signed call should be 200; unsigned should be 403.
curl -s -o /dev/null -w "unsigned=%{http_code}\n" \
https://orders-0123456789.7d67968.vpc-lattice-svcs.eu-west-1.on.aws/v1/orders/42
# Expect 403 (AccessDeniedException) because no SigV4 header was sent.
A correctly wired service returns 403 to an unsigned request and 200 to a SigV4-signed request from an allowed role. If unsigned requests succeed, the service or network auth-type is still NONE. If every request times out (no HTTP status at all), the failure is network-layer: the VPC association’s security group, a missing service-network-VPC association, or the service not being associated into the network.
Failure-mode debugging
Decode the symptom before touching config:
- Connection timeout / no response. Layer 3/4. Check, in order: VPC association exists, the association’s security group allows the egress, and the service is associated into the same network. DNS resolving to a
169.254.171.xaddress confirms the data path is programmed. - HTTP 403
AccessDeniedException. Auth policy denied. The request did reach Lattice (good — networking is fine). Either the caller did not SigV4-sign forvpc-lattice-svcs, or the principal/condition in the auth policy excludes them. Turn on access logs and read theauthDeniedReason. - HTTP 404 from Lattice. No listener rule matched. Check rule priorities and the default action.
- Targets
Unhealthy. The health check path/port is wrong, or the app SG does not allow the Lattice managed prefix on the target port. Lattice health checks originate from the managed data plane, not your client VPC.
Observability with access logs and CloudWatch
Lattice emits access logs and metrics per service and per service network. Enable access logs to a destination (CloudWatch Logs, S3, or Firehose) on the resource you want visibility into.
aws vpc-lattice create-access-log-subscription \
--resource-identifier "$SVC_ARN" \
--destination-arn arn:aws:logs:eu-west-1:444455556666:log-group:/aws/vpclattice/orders
Access log records include the source/target, the resolved path, response code, processing time, and the authenticated principal and auth-deny reason — which is exactly what you need to debug a 403. Query them in CloudWatch Logs Insights:
fields @timestamp, sourceIpPort, requestMethod, requestPath, responseCode, authDeniedReason, requestToTargetDuration
| filter responseCode = 403
| sort @timestamp desc
| limit 50
On the metrics side, Lattice publishes to the AWS/VpcLattice CloudWatch namespace — track HTTPCode_4XX_Count/HTTPCode_5XX_Count, RequestTime, and ActiveConnectionCount, dimensioned by service and target group, and alarm on a rising 4XX rate after any auth-policy change (the canary that catches an over-tightened policy in minutes).
Enterprise scenario
A payments platform team ran 30+ microservices spread across four accounts — a shared platform account, plus payments-prod, risk-prod, and partner-integrations. They had inherited an Istio mesh that worked, but every cross-account call required Transit Gateway routes, and two acquired business units shipped VPCs with overlapping 10.20.0.0/16 CIDRs they could not renumber without a multi-quarter migration. The Istio sidecars also added p99 latency and a steady stream of cert-rotation pages.
The constraint was concrete: the risk-scoring service in risk-prod had to call an enrichment service in partner-integrations, but the two VPCs had overlapping address space, so no amount of TGW routing could make the real IPs reachable. Service mesh did not help — it still rode on top of L3 reachability they did not have.
They moved cross-account service calls to a single Lattice service network, shared from platform via RAM to the org’s prod OU. Because Lattice addresses services by name and a link-local range rather than the target’s real IP, the CIDR overlap simply stopped mattering — the enrichment service was reachable as enrichment.platform.internal regardless of what 10.20/16 meant in either VPC. They replaced Istio AuthorizationPolicy objects with Lattice auth policies keyed on EKS Pod Identity role ARNs, and gated the whole network by aws:PrincipalOrgID so nothing outside the org could ever sign a valid request.
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": "*",
"Action": "vpc-lattice-svcs:Invoke",
"Resource": "*",
"Condition": {
"StringEquals": { "aws:PrincipalOrgID": "o-abc123" },
"ArnLike": {
"aws:PrincipalArn": "arn:aws:iam::*:role/payments-*"
}
}
}]
}
The outcome: sidecars came out of the payments path (p99 dropped and the cert-rotation pager went quiet), the overlapping-CIDR blocker was retired without renumbering, and cross-account authorization became reviewable IAM JSON in the same pipeline as the rest of their policies. They kept Istio inside each cluster for intra-cluster traffic where they wanted fine-grained Envoy control, and used Lattice strictly for the cross-account, cross-VPC hops — the boundary where its managed data plane and IAM model earned their keep.