Azure Networking

API Management Self-Hosted Gateway: Hybrid APIs and Advanced Policy Engineering

Azure API Management is two products fused together: a control plane that lives in Azure and a data plane (the gateway) that terminates and shapes API traffic. For pure-cloud estates the managed gateway in the APIM instance is enough. The moment an API has to run next to a backend that cannot be reached from Azure — a payments service pinned to an on-prem datacenter, a workload in another cloud, a latency-sensitive endpoint that cannot tolerate a hairpin out to Azure and back — you reach for the self-hosted gateway: the same Envoy-free, .NET-based gateway runtime packaged as a container, deployed to your own Kubernetes, configured from the same Azure control plane.

This guide deploys the self-hosted gateway to AKS, then spends most of its length where the real engineering is — the policy pipeline. Policies are the only place APIM does anything interesting: JWT validation, claims-based authorization, tiered rate limiting, response caching, circuit breaking, secret injection. Get the pipeline right and APIM is a serious edge. Get it wrong and it is an expensive reverse proxy.

Versions and SKUs. Self-hosted gateways require a Developer or Premium tier classic instance, or the v2 Premium tier. Consumption, Basic, and Standard cannot host them. Commands use the az apim CLI and the Microsoft.ApiManagement provider. The gateway container image referenced is mcr.microsoft.com/azure-api-management/gateway:v2, the v2 (rolling) tag; pin a specific build (for example 2.x.y) for production.

1. APIM topology: managed gateway, workspaces, and self-hosted gateways

Internalize the deployment model before deploying anything. An APIM instance has exactly one control plane (the management API, developer portal, policy store, named values) and one or more gateways that enforce that configuration:

The critical mental model: configuration is authored once in Azure and replicated to every gateway. You do not write policy on the self-hosted gateway; you write it in the control plane, associate the API with the self-hosted gateway resource, and the runtime pulls it. A gateway only serves the APIs explicitly assigned to it.

RG=rg-apim-prod
APIM=apim-contoso-prod
LOC=eastus

# Create the gateway resource in the control plane (not the container yet)
az apim gateway create \
  --resource-group $RG --service-name $APIM \
  --gateway-id shgw-onprem-dc1 \
  --location-data '{"name":"On-Prem DC1","city":"Dallas","countryOrRegion":"US"}' \
  --description "Self-hosted gateway colocated with payments backend"

# Associate an API with this gateway so the gateway is allowed to serve it
az apim gateway api create \
  --resource-group $RG --service-name $APIM \
  --gateway-id shgw-onprem-dc1 \
  --api-id payments-api

location-data is metadata only — it does not place anything; it labels where you will run the container, which surfaces in the portal and metrics. The association in the second command is the part that matters: without it the gateway returns 404 for that API regardless of what policy exists.

2. Deploying the self-hosted gateway to AKS with config sync and tokens

The gateway authenticates to the control plane with a gateway token (a scoped SAS-style credential) and an endpoint URL. Both are retrievable per gateway. The token has an expiry — for production, treat it as a rotating secret, not a one-time paste.

# Endpoint the container polls for configuration
az apim gateway show \
  --resource-group $RG --service-name $APIM \
  --gateway-id shgw-onprem-dc1 \
  --query "{configEndpoint:'https://"$APIM".configuration.azure-api.net'}"

# Generate a gateway token (max 30 days on the CLI; rotate before expiry)
EXPIRY=$(date -u -v+30d '+%Y-%m-%dT%H:%M:%SZ' 2>/dev/null || date -u -d '+30 days' '+%Y-%m-%dT%H:%M:%SZ')
az apim gateway token generate \
  --resource-group $RG --service-name $APIM \
  --gateway-id shgw-onprem-dc1 \
  --key-type primary \
  --expiry "$EXPIRY" \
  --query value -o tsv

Land the endpoint and token in a Kubernetes Secret, then deploy. The control-plane hostname for v2 configuration is <name>.configuration.azure-api.net reached over HTTPS (443). The gateway also opens an outbound connection for live config sync; if egress is locked down, allow the configuration endpoint and the instance’s metrics/telemetry endpoints.

apiVersion: v1
kind: Secret
metadata:
  name: shgw-onprem-dc1-token
  namespace: apim
type: Opaque
stringData:
  # "GatewayKey <gateway-id>&<expiry>&<signature>" — the full token string
  value: "GatewayKey shgw-onprem-dc1&20260708..."
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: shgw-onprem-dc1
  namespace: apim
spec:
  replicas: 3
  selector:
    matchLabels: { app: shgw-onprem-dc1 }
  template:
    metadata:
      labels: { app: shgw-onprem-dc1 }
    spec:
      containers:
        - name: shgw
          image: mcr.microsoft.com/azure-api-management/gateway:v2
          ports:
            - name: http
              containerPort: 8080
            - name: https
              containerPort: 8081
          env:
            - name: config.service.endpoint
              value: "https://apim-contoso-prod.configuration.azure-api.net"
            - name: config.service.auth
              valueFrom:
                secretKeyRef:
                  name: shgw-onprem-dc1-token
                  key: value
            # Survive a control-plane outage: serve last-known-good config
            - name: net.server.tls.ciphers.allowed
              value: "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384"
          readinessProbe:
            httpGet: { path: /status-0123456789abcdef, port: 8080 }
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet: { path: /status-0123456789abcdef, port: 8080 }
            initialDelaySeconds: 10
            periodSeconds: 15
          resources:
            requests: { cpu: "200m", memory: "256Mi" }
            limits: { cpu: "1", memory: "512Mi" }

/status-0123456789abcdef is the gateway’s built-in liveness path — it returns 200 once the runtime is up, independent of config sync, which makes it the correct probe target. The gateway caches the last successful configuration on local disk; if the control plane is unreachable at startup it will not serve traffic, but if it has already synced and the control plane goes down, it keeps serving the cached config. That property is the whole point of running it on-prem — a transient Azure outage does not take down your payments edge.

Front the Deployment with a Service (and your own Ingress/LoadBalancer) and the gateway is live, serving only payments-api and reporting health back to the portal.

3. Policy scopes and the inbound/backend/outbound/on-error pipeline

A policy is XML evaluated in four sections, in order, for every request:

inbound  --> backend --> outbound
   \                         /
    \----> on-error <-------/   (entered on any thrown error)

Policies are layered by scope, and the magic word <base /> controls inheritance. Scopes, outermost to innermost: All APIs (global) -> Product -> API -> Operation. At each level, <base /> injects the policy from the enclosing scope. Omit <base /> and you replace the parent — a common and dangerous mistake, because dropping the global inbound <base /> silently removes your org-wide JWT check on that one API.

<!-- API-scope policy: global edge rules run first, then API-specific rules -->
<policies>
  <inbound>
    <base />                                   <!-- inherit global + product inbound -->
    <set-header name="X-Correlation-Id" exists-action="skip">
      <value>@(context.RequestId)</value>
    </set-header>
  </inbound>
  <backend>
    <base />
  </backend>
  <outbound>
    <base />
    <set-header name="Server" exists-action="delete" />
  </outbound>
  <on-error>
    <base />
  </on-error>
</policies>

Everything inside @( ... ) is a C# policy expression with access to the context object — context.Request, context.Response, context.User, context.Variables, context.Subscription. Multi-statement logic uses @{ ... return x; }. This is where APIM stops being declarative and becomes programmable.

4. validate-jwt, OAuth2, and claims-based authorization

The validate-jwt policy is the workhorse of the inbound section. It validates signature, issuer, audience, and expiry against an OpenID Connect metadata endpoint, then exposes the decoded token to later policies. For Microsoft Entra ID, point it at the tenant’s v2 metadata document and check aud against your API’s Application ID URI.

<inbound>
  <base />
  <validate-jwt header-name="Authorization"
                failed-validation-httpcode="401"
                failed-validation-error-message="Unauthorized. Invalid or missing token."
                require-expiration-time="true"
                require-signed-tokens="true"
                clock-skew="120">
    <openid-config url="https://login.microsoftonline.com/{tenant-id}/v2.0/.well-known/openid-configuration" />
    <audiences>
      <audience>api://payments-api</audience>
    </audiences>
    <issuers>
      <issuer>https://login.microsoftonline.com/{tenant-id}/v2.0</issuer>
    </issuers>
    <required-claims>
      <claim name="roles" match="any">
        <value>Payments.Read</value>
        <value>Payments.Write</value>
      </claim>
    </required-claims>
  </validate-jwt>
</inbound>

clock-skew (seconds) absorbs clock drift between your IdP and the gateway — set it explicitly; the default tolerance has bitten enough teams to warrant it. match="any" admits the request if any listed role is present; use match="all" to require every value.

validate-jwt only proves the token is valid and carries a coarse claim. Fine-grained authorization — “this operation requires Payments.Write, this one only Payments.Read” — belongs in a policy expression that reads the already-validated token. Reference the parsed JWT via the value the policy stored, and fail closed:

<inbound>
  <base />
  <!-- Persist the validated token so operation-scope policy can inspect claims -->
  <validate-jwt header-name="Authorization" output-token-variable-name="jwt" ...>
    <openid-config url="https://login.microsoftonline.com/{tenant-id}/v2.0/.well-known/openid-configuration" />
    <audiences><audience>api://payments-api</audience></audiences>
  </validate-jwt>

  <!-- Operation-scope: writes demand the stronger role -->
  <choose>
    <when condition="@(context.Request.Method == "POST" || context.Request.Method == "PUT")">
      <set-variable name="canWrite" value="@(((Jwt)context.Variables["jwt"]).Claims.GetValueOrDefault("roles", "").Contains("Payments.Write"))" />
      <choose>
        <when condition="@(!(bool)context.Variables["canWrite"])">
          <return-response>
            <set-status code="403" reason="Forbidden" />
            <set-body>@("{\"error\":\"Payments.Write role required\"}")</set-body>
          </return-response>
        </when>
      </choose>
    </when>
  </choose>
</inbound>

output-token-variable-name hands you a strongly-typed Jwt object whose .Claims is a dictionary — far more robust than re-parsing the Authorization header by hand. Authorize on claims, never on the raw header.

5. Rate-limit-by-key and quota policies for tiered consumers

Two policies, two purposes, constantly confused:

The -by-key variants let you choose the counter dimension via an expression, which is what makes per-consumer tiering possible. Key by subscription, by client IP, or by a claim. Tier the limits off a product or a claim and you have a self-service plan model:

<inbound>
  <base />
  <!-- Per-subscription sliding-window throttle: 100 calls / 10s -->
  <rate-limit-by-key calls="100" renewal-period="10"
                     counter-key="@(context.Subscription?.Id ?? context.Request.IpAddress)"
                     remaining-calls-header-name="X-RateLimit-Remaining"
                     remaining-calls-variable-name="remainingCalls"
                     retry-after-header-name="Retry-After" />

  <!-- Tiered monthly quota driven by the product's "tier" claim -->
  <choose>
    <when condition="@(context.Product?.Name == "Premium")">
      <quota-by-key calls="5000000" renewal-period="2592000"
                    counter-key="@(context.Subscription.Id)" />
    </when>
    <otherwise>
      <quota-by-key calls="100000" renewal-period="2592000"
                    counter-key="@(context.Subscription.Id)" />
    </otherwise>
  </choose>
</inbound>

renewal-period is seconds (2592000 = 30 days). The ?. null-conditional on context.Subscription matters: an unauthenticated or subscription-key-less request has no Subscription, so falling back to IpAddress prevents a null-reference error that would otherwise route to on-error and 500.

Self-hosted gateway caveat — rate-limit counters are local. The -by-key counters in a self-hosted gateway are kept per gateway instance (per pod), not shared across replicas, unless you attach an external cache. Three replicas with calls="100" admit up to ~300 in the window. For accurate aggregate limits across pods, configure an external Redis cache (next section) — the rate-limit policies use it as the shared counter store. The managed gateway shares counters automatically; the self-hosted one does not.

6. Response caching, backend circuit breaking, and retry policies

External cache for the self-hosted gateway

The internal APIM cache does not exist in the self-hosted gateway — you must register an external Redis-compatible cache. Once registered, both cache-lookup/cache-store and the distributed rate-limit counters use it.

az apim cache create \
  --resource-group $RG --service-name $APIM \
  --cache-id shgw-onprem-redis \
  --connection-string "redis-onprem.internal:6379,password=...,ssl=True" \
  --use-from-location "shgw-onprem-dc1" \
  --description "Redis colocated with self-hosted gateway"

--use-from-location binds the cache to the gateway’s location-data name so that gateway resolves this cache (keep Redis on the same network as the pods to avoid a cross-region hop). Then cache GETs in policy:

<inbound>
  <base />
  <cache-lookup vary-by-developer="false" vary-by-developer-groups="false"
                downstream-caching-type="none" caching-type="external">
    <vary-by-header>Accept</vary-by-header>
    <vary-by-query-parameter>region</vary-by-query-parameter>
  </cache-lookup>
</inbound>
<outbound>
  <base />
  <cache-store duration="30" />   <!-- seconds; only stores cacheable responses -->
</outbound>

caching-type="external" is mandatory on the self-hosted gateway — internal is a no-op there. cache-store honors Cache-Control from the backend, so a no-store backend response is never cached even with this policy present.

Backend resilience: retry and circuit breaker

Two layers. retry wraps the backend call and re-sends on transient failure; backend circuit breaker is configured on the backend entity and trips the whole backend out of rotation when failures cross a threshold. Use both: retry for blips, breaker for a backend that is genuinely down so you stop hammering it.

<backend>
  <retry condition="@(context.Response.StatusCode == 502 || context.Response.StatusCode == 503)"
         count="3" interval="2" max-interval="10" delta="2" first-fast-retry="false">
    <forward-request buffer-request-body="true" timeout="20" />
  </retry>
</backend>

The circuit breaker lives on the Microsoft.ApiManagement/service/backends resource, not in policy XML — define it once and reference the backend with <set-backend-service backend-id="..." />:

resource paymentsBackend 'Microsoft.ApiManagement/service/backends@2023-09-01-preview' = {
  parent: apim
  name: 'payments-backend'
  properties: {
    url: 'https://payments.internal.contoso.com'
    protocol: 'http'
    circuitBreaker: {
      rules: [
        {
          name: 'trip-on-5xx'
          failureCondition: {
            count: 10                 // 10 failures...
            interval: 'PT1M'          // ...within 1 minute...
            statusCodeRanges: [ { min: 500, max: 599 } ]
            errorReasons: [ 'Timeout' ]
          }
          tripDuration: 'PT30S'       // ...opens the circuit for 30s
          acceptRetryAfter: true      // honor backend Retry-After
        }
      ]
    }
  }
}

first-fast-retry="false" keeps the first retry on the backoff schedule (set true only when an immediate single retry is known-safe). The breaker’s acceptRetryAfter makes the gateway respect a backend’s own Retry-After instead of blindly re-probing.

7. Policy fragments, named values, and Key Vault-backed secrets

Three features keep policy DRY and secret-free.

Named values are the configuration store — plain strings, or secrets, or Key Vault references that APIM resolves and auto-rotates (re-fetch interval default 4 hours). Never paste a secret into policy XML; reference a named value.

# Key Vault-backed named value — APIM's managed identity must have 'get' on the secret
az apim nv create \
  --resource-group $RG --service-name $APIM \
  --named-value-id payments-hmac-key \
  --display-name "payments-hmac-key" \
  --secret true \
  --key-vault-secret-id "https://kv-apim-prod.vault.azure.net/secrets/payments-hmac"

Policy fragments are reusable XML snippets included by reference, so the org-standard auth + correlation block is authored once and pulled into every API:

<!-- Fragment: "std-edge" — authored once in the control plane -->
<fragment>
  <validate-jwt header-name="Authorization" failed-validation-httpcode="401">
    <openid-config url="https://login.microsoftonline.com/{tenant-id}/v2.0/.well-known/openid-configuration" />
    <audiences><audience>{{api-audience}}</audience></audiences>
  </validate-jwt>
  <set-header name="X-Correlation-Id" exists-action="skip">
    <value>@(context.RequestId)</value>
  </set-header>
</fragment>
<!-- Any API references the fragment and a named value by {{name}} -->
<inbound>
  <base />
  <include-fragment fragment-id="std-edge" />
  <set-header name="X-Signing-Key" exists-action="override">
    <value>{{payments-hmac-key}}</value>
  </set-header>
</inbound>

{{named-value}} is substituted at runtime; for Key Vault-backed values the resolution and rotation happen in the control plane and replicate to every gateway, including self-hosted ones — the pod never touches Key Vault directly, which keeps the secret out of the cluster.

8. Versioning, revisions, and CI/CD for APIM configuration as code

Two distinct mechanisms, both required for safe change:

# Create a revision to stage a policy change without touching production traffic
az apim api revision create \
  --resource-group $RG --service-name $APIM \
  --api-id payments-api --api-revision 3 \
  --api-revision-description "Add Payments.Write enforcement on POST"

# After validation, promote it (atomic; instantly reversible)
az apim api release create \
  --resource-group $RG --service-name $APIM \
  --api-id payments-api --release-id rel-3 \
  --api-revision 3 --notes "Enforce write role"

For real config-as-code, do not click in the portal. Export the instance to a deployable artifact and gate it through a pipeline. The APIOps toolkit (the supported pattern) extracts everything — APIs, policies, fragments, named values, backends — into a Git-friendly folder of YAML + raw policy XML, then publishes diffs forward through environments. Policies live as .xml files reviewed in pull requests.

# Azure Pipelines: extract from dev, publish the diff to prod
steps:
  - task: AzureCLI@2
    displayName: Extract APIM config (APIOps)
    inputs:
      azureSubscription: sc-apim
      scriptType: bash
      scriptLocation: inlineScript
      inlineScript: |
        ./extractor \
          --AZURE_SUBSCRIPTION_ID $(subId) \
          --AZURE_RESOURCE_GROUP_NAME rg-apim-dev \
          --API_MANAGEMENT_SERVICE_NAME apim-contoso-dev \
          --API_MANAGEMENT_SERVICE_OUTPUT_FOLDER_PATH $(Build.SourcesDirectory)/apim-artifacts

  - task: AzureCLI@2
    displayName: Publish to prod
    inputs:
      azureSubscription: sc-apim
      scriptType: bash
      scriptLocation: inlineScript
      inlineScript: |
        ./publisher \
          --AZURE_SUBSCRIPTION_ID $(subId) \
          --AZURE_RESOURCE_GROUP_NAME rg-apim-prod \
          --API_MANAGEMENT_SERVICE_NAME apim-contoso-prod \
          --API_MANAGEMENT_SERVICE_OUTPUT_FOLDER_PATH $(Build.SourcesDirectory)/apim-artifacts \
          --COMMIT_ID $(Build.SourceVersion)

--COMMIT_ID makes the publisher diff only what changed in that commit, so a one-line policy edit deploys one policy, not the whole instance. Named-value secrets are never extracted in plaintext — Key Vault references travel as references, real secrets stay in Key Vault.

Verify

Confirm the gateway is healthy, registered, and enforcing policy — not just running.

# 1. The gateway pods are live and synced
kubectl get pods -n apim -l app=shgw-onprem-dc1
kubectl logs -n apim deploy/shgw-onprem-dc1 | grep -i "configuration"   # expect a successful sync line

# 2. The gateway reports Connected in the control plane
az apim gateway list-keys --resource-group $RG --service-name $APIM --gateway-id shgw-onprem-dc1 >/dev/null
# Portal: API Management > Gateways > shgw-onprem-dc1 shows status "Connected" and replica count

# 3. JWT enforcement rejects an unauthenticated call (expect 401)
curl -i https://api.contoso.com/payments/v1/accounts

# 4. A valid token + insufficient role is rejected (expect 403)
curl -i -X POST https://api.contoso.com/payments/v1/transfers \
  -H "Authorization: Bearer $READONLY_TOKEN"

# 5. Rate limit trips and returns Retry-After (expect 429 after the window fills)
for i in $(seq 1 120); do curl -s -o /dev/null -w "%{http_code}\n" \
  https://api.contoso.com/payments/v1/accounts -H "Authorization: Bearer $TOKEN"; done | sort | uniq -c

For policy-level diagnostics, the API Inspector / request trace is indispensable: send a request with the Ocp-Apim-Trace: true header plus a valid trace token (obtained from the portal Test console or az apim api operation), and the response includes an Ocp-Apim-Trace-Location URL showing exactly which policies ran, in order, with the value of every expression. Confirm validate-jwt fired before your rate-limit, and that <base /> actually pulled the global policy.

// Self-hosted gateway: 4xx/5xx by API over the last hour (Log Analytics)
ApiManagementGatewayLogs
| where TimeGenerated > ago(1h)
| where ResponseCode >= 400
| summarize count() by ApiId, ResponseCode, bin(TimeGenerated, 5m)
| order by TimeGenerated desc

Enterprise scenario

A payments platform team ran APIM as the front door for a card-authorization API whose backend was legally pinned to an on-prem datacenter — data residency rules forbade the transaction payload from transiting a public Azure endpoint. The managed gateway was a non-starter: every call would hairpin from the on-prem clients out to azure-api.net and back to the on-prem backend, adding ~80ms and, worse, putting regulated payloads on a path that crossed the public APIM gateway.

They deployed the self-hosted gateway to an on-prem AKS-on-Azure-Stack-HCI cluster colocated with the backend, registered against the production APIM instance. Authoring, JWT policy, and rate limits stayed centralized in Azure; only the data plane moved. The payload never left the datacenter.

The bug that nearly shipped: their tiered rate-limit-by-key (Premium consumers at 5,000 rps) let traffic through at roughly 3x the configured ceiling under load. The cause was the self-hosted-gateway counter locality from section 5 — five replicas, five independent counters. The fix was registering an external Redis colocated with the gateway and re-binding the cache so the rate-limit policy used a shared store:

az apim cache create \
  --resource-group rg-apim-prod --service-name apim-contoso-prod \
  --cache-id shgw-redis-dc1 \
  --connection-string "redis-dc1.internal:6380,password=$(cat /run/secrets/redis),ssl=True" \
  --use-from-location "On-Prem DC1"

With the external cache attached, the five replicas shared one counter and the aggregate limit held within a few percent — and the same Redis backed cache-lookup, cutting backend authorization load by a third during a known traffic spike. The lesson the team wrote into their runbook: on the self-hosted gateway, any policy that “counts” (rate-limit, quota, cache) is per-pod until you give it an external cache.

Checklist

api-managementself-hosted-gatewaypolicyhybridapis

Comments

Keep Reading