IaC Multi-Cloud

Advanced Pulumi in Python: Dynamic Providers and Stack References

Most Pulumi tutorials stop at aws.s3.Bucket. Real platforms run into two harder problems: there is no native provider for some internal or niche SaaS API you must manage, and your infrastructure is too large to live in one stack. Pulumi’s Python SDK has first-class answers for both. Dynamic providers let you implement a resource’s full lifecycle in plain Python, and StackReference lets independently-deployed stacks consume each other’s outputs without sharing state. This guide builds both correctly, including the serialization and secret-handling traps that bite people in production.

Everything here targets pulumi 3.x and the pulumi Python package 3.x on Python 3.9+.

1. The resource model: inputs, outputs, and apply

Before writing a provider you must internalize how Pulumi values flow. Every resource argument is an Input[T]: it may be a plain value, an Output[T], or an Awaitable. Every resource attribute Pulumi gives back is an Output[T]. An Output is a promise plus a dependency edge plus a secret flag. You never read its value synchronously during pulumi up, because at preview time the value may be unknown.

import pulumi
from pulumi_aws import s3

bucket = s3.BucketV2("data")

# WRONG: bucket.id is an Output, not a str. This prints a wrapper.
# resource_name = bucket.id + "-logs"   # works by luck for str-like, but do not rely on it

# RIGHT: transform inside apply; the lambda runs only when the value is known.
log_name = bucket.id.apply(lambda bid: f"{bid}-logs")

Two rules that matter for the provider work below:

url = pulumi.Output.all(bucket.bucket, bucket.region).apply(
    lambda args: f"https://{args[0]}.s3.{args[1]}.amazonaws.com"
)

Output.format is the readable equivalent of concat:

url = pulumi.Output.format("https://{0}.s3.{1}.amazonaws.com", bucket.bucket, bucket.region)

2. Building a dynamic provider

A dynamic provider is a Python class implementing pulumi.dynamic.ResourceProvider. You subclass pulumi.dynamic.Resource and pass an instance of the provider plus the inputs. The engine calls your provider’s lifecycle methods over its diff loop. The methods you care about are create, update, delete, diff, and optionally check and read.

The example manages a “DNS record” in a fictional REST API that has no Pulumi provider. The principle generalizes to any CRUD API.

# dnsrecord.py
import requests
from pulumi.dynamic import (
    ResourceProvider,
    CreateResult,
    UpdateResult,
    DiffResult,
    CheckResult,
    CheckFailure,
)


class DnsRecordProvider(ResourceProvider):
    def check(self, _olds, news):
        failures = []
        if news.get("type") not in ("A", "AAAA", "CNAME", "TXT"):
            failures.append(CheckFailure("type", "type must be A, AAAA, CNAME, or TXT"))
        return CheckResult(news, failures)

    def create(self, props):
        resp = requests.post(
            f"{props['endpoint']}/zones/{props['zone']}/records",
            headers={"Authorization": f"Bearer {props['token']}"},
            json={"name": props["name"], "type": props["type"], "value": props["value"]},
            timeout=30,
        )
        resp.raise_for_status()
        record = resp.json()
        # outs becomes the resource's outputs; id is the physical identifier.
        return CreateResult(id_=record["id"], outs={**props, "record_id": record["id"]})

    def diff(self, _id, olds, news):
        replaces = []
        # Changing name or type forces replacement; value can be updated in place.
        for field in ("name", "type", "zone"):
            if olds.get(field) != news.get(field):
                replaces.append(field)
        changed = replaces or olds.get("value") != news.get("value")
        return DiffResult(
            changes=changed,
            replaces=replaces,
            delete_before_replace=True,
        )

    def update(self, id_, _olds, news):
        resp = requests.put(
            f"{news['endpoint']}/zones/{news['zone']}/records/{id_}",
            headers={"Authorization": f"Bearer {news['token']}"},
            json={"value": news["value"]},
            timeout=30,
        )
        resp.raise_for_status()
        return UpdateResult(outs={**news, "record_id": id_})

    def delete(self, id_, props):
        resp = requests.delete(
            f"{props['endpoint']}/zones/{props['zone']}/records/{id_}",
            headers={"Authorization": f"Bearer {props['token']}"},
            timeout=30,
        )
        if resp.status_code not in (200, 204, 404):  # 404 == already gone, treat as success
            resp.raise_for_status()

The typed resource wrapper exposes outputs as Output attributes via class-level annotations:

from typing import Optional
import pulumi
from pulumi.dynamic import Resource


class DnsRecord(Resource):
    record_id: pulumi.Output[str]
    name: pulumi.Output[str]

    def __init__(self, name, zone, record_name, type, value, endpoint, token,
                 opts: Optional[pulumi.ResourceOptions] = None):
        super().__init__(
            DnsRecordProvider(),
            name,
            {
                "zone": zone,
                "name": record_name,
                "type": type,
                "value": value,
                "endpoint": endpoint,
                "token": token,
                "record_id": None,  # declared so it is a known output key
            },
            opts,
        )

Why declare record_id: None in the inputs? Any key you want back as an output must exist in the args dict. Pulumi populates it from the outs your create/update returns; if you omit the key, the output attribute resolves to None even when the provider set it.

diff semantics matter

diff is where you control whether a change is an in-place update or a replacement. Get this wrong and you either orphan cloud resources or trigger needless rebuilds. replaces lists the properties whose change forces a new resource. delete_before_replace=True deletes the old resource before creating the new one, which you need when a unique constraint (like a DNS name) would collide if both existed at once. If you return changes=False, Pulumi shows no diff and skips update entirely.

3. Serialization pitfalls and secret inputs

This is the part that trips up nearly everyone. Pulumi serializes your dynamic provider instance, by pickling its __init__-captured state, and stores it in state. At update time it deserializes that pickle and calls your methods. Three consequences:

  1. The provider class must be importable by a stable path. Do not define the provider class inline in __main__ or inside a function. Put it in a module (dnsrecord.py) so unpickling can locate DnsRecordProvider.
  2. Do not capture unpicklable or environment-specific objects (open sockets, live clients, file handles) in the provider’s __init__. Build clients inside the lifecycle methods using values passed via props, as shown above. Anything the methods need must arrive through the serialized inputs.
  3. Heavy or version-sensitive imports that you capture get pinned into state. Keep providers lean.

For secrets, never pass a raw token as a normal input that lands in plaintext state. Mark it secret so Pulumi encrypts it at rest and redacts it in logs and diffs:

import pulumi

cfg = pulumi.Config()
api_token = cfg.require_secret("dnsApiToken")  # Output[str], flagged secret

record = DnsRecord(
    "www",
    zone="example.com",
    record_name="www",
    type="A",
    value="203.0.113.10",
    endpoint="https://dns.internal.example.com/api",
    token=api_token,  # secret flows through; state encrypts it
)

You can also force individual output properties to be treated as secrets from inside the provider by listing them when constructing results. Pulumi propagates the secret flag through any Output derived from a secret input automatically, so the common case is handled for you as long as the input arrives as a secret.

Caveat: dynamic providers run in process during pulumi up. Their dependencies are your program’s dependencies, so pin requests (or whatever SDK) in requirements.txt. There is no separate provider plugin binary to install.

4. Cross-stack architecture with StackReference

Large estates split into layers: a networking stack, a data stack, an app stack. Each is deployed independently and owns its blast radius. They communicate through stack outputs and StackReference, not shared state files.

Export outputs from the producing stack with pulumi.export:

# networking/__main__.py
import pulumi
from pulumi_aws import ec2

vpc = ec2.Vpc("main", cidr_block="10.0.0.0/16")
private = ec2.Subnet("private-a", vpc_id=vpc.id, cidr_block="10.0.1.0/24",
                     availability_zone="us-east-1a")

pulumi.export("vpc_id", vpc.id)
pulumi.export("private_subnet_ids", pulumi.Output.all(private.id).apply(list))

Consume them in another stack. The reference name is <org>/<project>/<stack> for Pulumi Cloud, or <project>/<stack> when using a self-managed backend without an org:

# app/__main__.py
import pulumi
from pulumi_aws import ec2

net = pulumi.StackReference("acme/networking/prod")

vpc_id = net.get_output("vpc_id")
subnet_ids = net.get_output("private_subnet_ids")

sg = ec2.SecurityGroup("app", vpc_id=vpc_id)

get_output returns an Output, preserving the dependency and secret flags across the boundary. A few operational notes:

The StackReference resource needs read access to the referenced stack’s state. With Pulumi Cloud that means the deploying identity must have read permission on the source stack.

5. Per-environment config, ESC, and secret providers

Each stack carries its own config file (Pulumi.dev.yaml, Pulumi.prod.yaml). Set plain and secret values with the CLI:

pulumi config set aws:region us-east-1
pulumi config set app:replicas 3
pulumi config set --secret app:dnsApiToken 'tok_live_xxx'

Secrets are encrypted with the stack’s secret provider. The default is the Pulumi Cloud service, but for self-managed backends or stricter key custody you should pin a KMS-backed provider when you initialize the stack:

pulumi stack init prod --secrets-provider="awskms://alias/pulumi-prod?region=us-east-1"
# Azure Key Vault and GCP KMS are equivalent:
#   azurekeyvault://<vault>.vault.azure.net/keys/<key>
#   gcpkms://projects/<p>/locations/<l>/keyRings/<r>/cryptoKeys/<k>

ESC: Environments, Secrets, and Configuration

For secrets and config that span many stacks, Pulumi ESC centralizes them and can broker short-lived cloud credentials via OIDC instead of static keys. Define an environment once, then import it from any stack’s config under the environment key.

# imported via: pulumi env init acme/aws-prod, then edited
values:
  aws:
    login:
      fn::open::aws-login:
        oidc:
          roleArn: arn:aws:iam::111122223333:role/pulumi-deploy
          sessionName: pulumi
          duration: 1h
  environmentVariables:
    AWS_ACCESS_KEY_ID: ${aws.login.accessKeyId}
    AWS_SECRET_ACCESS_KEY: ${aws.login.secretAccessKey}
    AWS_SESSION_TOKEN: ${aws.login.sessionToken}
# Pulumi.prod.yaml
environment:
  - aws-prod
config:
  app:replicas: 5

This is how you stop storing long-lived cloud keys in CI: ESC mints temporary credentials per run, and aws:region-style config still lives in the stack file.

6. Component resources for reusable, typed abstractions

A ComponentResource groups child resources under one logical node and is your unit of reuse, the Pulumi answer to a Terraform module, but with types. Define typed args with a dataclass, register outputs, and always set parent on children.

from dataclasses import dataclass
from typing import Optional
import pulumi
from pulumi_aws import s3


@dataclass
class StaticSiteArgs:
    index_document: str = "index.html"
    versioned: bool = True


class StaticSite(pulumi.ComponentResource):
    bucket_name: pulumi.Output[str]
    website_endpoint: pulumi.Output[str]

    def __init__(self, name: str, args: StaticSiteArgs,
                 opts: Optional[pulumi.ResourceOptions] = None):
        super().__init__("acme:web:StaticSite", name, {}, opts)
        child = pulumi.ResourceOptions(parent=self)

        bucket = s3.BucketV2(f"{name}-bucket", opts=child)
        if args.versioned:
            s3.BucketVersioningV2(
                f"{name}-ver",
                bucket=bucket.id,
                versioning_configuration={"status": "Enabled"},
                opts=child,
            )
        website = s3.BucketWebsiteConfigurationV2(
            f"{name}-web",
            bucket=bucket.id,
            index_document={"suffix": args.index_document},
            opts=child,
        )

        self.bucket_name = bucket.bucket
        self.website_endpoint = website.website_endpoint
        # Surfaces these as outputs and finalizes the component in the graph.
        self.register_outputs({
            "bucket_name": self.bucket_name,
            "website_endpoint": self.website_endpoint,
        })

The first argument to super().__init__ is the component’s type token (package:module:Type). Setting parent=self on every child nests them in pulumi stack graph and ties their lifecycle to the component. Forgetting register_outputs leaves the component half-constructed in state.

7. Testing with mocks and policy with CrossGuard

Pulumi’s unit-test framework swaps the engine for a mock so tests run with no cloud calls and no real pulumi up. Implement pulumi.runtime.Mocks, set it before importing your program, then assert on resource properties resolved through apply.

# test_infra.py
import pulumi


class Mocks(pulumi.runtime.Mocks):
    def new_resource(self, args: pulumi.runtime.MockResourceArgs):
        # Return (id, state). state echoes inputs plus computed fields.
        return [args.name + "_id", {**args.inputs, "arn": "arn:fake:" + args.name}]

    def call(self, args: pulumi.runtime.MockCallArgs):
        return {}


pulumi.runtime.set_mocks(Mocks(), preview=False)

import infra  # import AFTER set_mocks so resources register against the mock


@pulumi.runtime.test
def test_bucket_is_versioned():
    def check(args):
        status = args[0]
        assert status == "Enabled", "production buckets must be versioned"
    return infra.site_versioning.versioning_configuration.apply(
        lambda c: pulumi.Output.from_input([c["status"]])
    ).apply(check)

The @pulumi.runtime.test decorator handles the async output resolution; return an Output (or a coroutine) so the framework waits for assertions inside apply. Run with pytest.

For org-wide guardrails that run during preview and up, write a CrossGuard policy pack in Python. Policies fail the deployment when violated, so they gate every stack, not just the ones with tests.

# policy/__main__.py
from pulumi_policy import (
    PolicyPack, ResourceValidationPolicy, EnforcementLevel, ReportViolation,
)


def s3_no_public_acl(args, report: ReportViolation):
    if args.resource_type == "aws:s3/bucketV2:BucketV2":
        if args.props.get("acl") == "public-read":
            report("S3 buckets must not be public-read")


PolicyPack(
    name="acme-baseline",
    enforcement_level=EnforcementLevel.MANDATORY,
    policies=[
        ResourceValidationPolicy(
            name="s3-no-public-acl",
            description="Disallow public-read S3 buckets",
            validate=s3_no_public_acl,
        ),
    ],
)
pulumi preview --policy-pack ./policy

8. CI/CD: preview gating and update with the GitHub Action

The discipline that makes this safe is: preview on every pull request, comment the diff, require approval, then update on merge. Use the official pulumi/actions@v6 action with OIDC so no static cloud or Pulumi tokens sit in the repo.

# .github/workflows/pulumi.yml
name: pulumi
on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

permissions:
  id-token: write       # OIDC to cloud and to Pulumi
  contents: read
  pull-requests: write  # so the action can comment the preview

jobs:
  preview:
    if: github.event_name == 'pull_request'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install -r requirements.txt
      - uses: pulumi/actions@v6
        with:
          command: preview
          stack-name: acme/app/prod
          comment-on-pr: true

  update:
    if: github.event_name == 'push'
    runs-on: ubuntu-latest
    environment: production   # GitHub Environment protection rule = approval gate
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install -r requirements.txt
      - uses: pulumi/actions@v6
        with:
          command: up
          stack-name: acme/app/prod

Two gating mechanisms are doing the work. The pull_request job runs preview and posts the plan as a PR comment so a human reviews the diff. The push job is bound to a GitHub Environment (production) with a required-reviewers protection rule, so the merge-to-deploy step blocks until approved. For multi-stack ordering, run the producer stack’s up job before the consumer’s, gated on success, so StackReference consumers see fresh outputs.

Verify

Run these to confirm each piece behaves. The dynamic provider:

pulumi preview                       # should show the DnsRecord with known/unknown props
pulumi up --yes                      # create() runs; record_id appears in outputs
pulumi stack output --show-secrets   # token is encrypted at rest, decrypted only here
pulumi up --yes                      # change value only -> in-place update, no replace
pulumi destroy --yes                 # delete() runs; 404 tolerated as success

Confirm secrets never leak to plaintext state. With a self-managed backend you can inspect the export:

pulumi stack export | python -c "import json,sys; \
  s=json.load(sys.stdin); \
  print('SECRETS PRESENT' if 'ciphertext' in json.dumps(s) else 'NO CIPHERTEXT')"

Validate cross-stack wiring and policy:

pulumi stack output vpc_id --stack acme/networking/prod   # producer exports it
pulumi preview --stack acme/app/prod                      # consumer resolves the reference
pulumi preview --policy-pack ./policy                     # MANDATORY policy blocks violations
pytest -q                                                 # mocks run with zero cloud calls

Expected results: pulumi up on a value-only change reports ~ update (not +- replace); a public-read bucket fails preview under the policy pack with a non-zero exit; pytest passes offline; and the stack export shows ciphertext for the token, never the raw value.

Checklist

pulumipythondynamic-providersstack-referencesautomation

Comments

Keep Reading