GCP Lesson 51 of 98

Google Cloud Build & Cloud Deploy, In Depth: Pipelines, Triggers, Substitutions & Releases

Cloud Build and Cloud Deploy are Google Cloud’s two native, fully managed CI/CD services, and together they form a clean division of labour: Cloud Build is your CI engine — it runs your build, test, and packaging steps in containers and pushes the resulting artefacts to a registry — and Cloud Deploy is your CD engine — it takes a built artefact and progresses it through an ordered sequence of environments (dev → staging → prod) with promotion, approvals, canary rollouts, and one-command rollback. Neither requires you to run a server, patch a Jenkins box, or babysit a runner fleet; Google operates the execution infrastructure, you supply the configuration. The two are designed to chain: a Cloud Build trigger fires on a git push, builds and tests your code, pushes an image to Artifact Registry, and then hands off to a Cloud Deploy delivery pipeline that rolls that exact image out to GKE or Cloud Run, gated by approvals and verified by Binary Authorization.

This lesson is deliberately exhaustive across both products. For Cloud Build we cover the build config end to end — the cloudbuild.yaml/cloudbuild.json schema, steps and builders (cloud builders, community builders, custom builders), the shared /workspace volume and how data flows between steps, substitutions (built-in, user-defined, and the substitution options that change parsing), artifacts (images, generic artefacts to GCS, Maven/npm/Python packages, Go modules), machine types and disk sizing, timeouts (build-level and per-step), parallel and sequential execution with waitFor and id, logging options, triggers (push to branch, pull request, tag, manual, webhook, Pub/Sub, and the GitHub/GitLab/Bitbucket connections behind them), the build service account and the IAM that governs it, default pools vs private pools (with VPC peering and static egress), secrets via Secret Manager and the legacy KMS path, and caching strategies (Kaniko cache, cached Docker images, --cache-from, and Cloud Storage caches). For Cloud Deploy we cover the delivery pipeline → targets model, Skaffold as the render/deploy engine, releases and rollouts, promotion and approval gates, deployment strategies (standard, canary with verify/predeploy/postdeploy hooks, and per-phase percentages), rollback, multi-target and parallel deployment, target types (GKE, GKE Autopilot, Cloud Run, Anthos/Connect gateway, and multi-target), and automation rules. We close on the build → Artifact Registry → deploy chain and Binary Authorization. Every option gets the same treatment — what it is · the choices · the default · when to pick which · the trade-off · the limit · the cost impact · the gotcha — and every operation comes with a real gcloud command. Everything reflects the current 2026 surface (gcloud builds, gcloud deploy, Skaffold v4 schema, second-generation repository connections).

Learning objectives

By the end of this lesson you can:

Prerequisites & where this fits

You should already understand Google Cloud’s resource hierarchy — organisation → folder → project → resource — what a region is, how to run gcloud from Cloud Shell or a local SDK install (covered in the Fundamentals module), the basics of a container image and a Dockerfile, and a little Git. It helps to have read the Artifact Registry deep dive — that is where Cloud Build pushes images and where Cloud Deploy pulls them — and to know roughly what GKE and Cloud Run are, since those are the deploy targets; but every term is defined here. This is the CI/CD lesson of the DevOps module in the GCP Zero-to-Hero course. It sits downstream of source control and the registry and upstream of your running workloads: once you can drive Cloud Build and Cloud Deploy fluently you can take code from a git push all the way to a gated, canary-released production rollout without leaving Google Cloud. For the keyless way to authenticate external CI (e.g. GitHub Actions) to GCP — the alternative to running CI inside Cloud Build — pair this with Workload Identity Federation for keyless CI/CD.

Core concepts

Before the options, fix the mental models. They explain why every setting is shaped the way it is.

Cloud Build runs steps as containers on an ephemeral worker. A build is an ordered (or partially parallel) list of steps. Each step is just a container image plus a command to run inside it. Google spins up a fresh, throwaway VM (the worker), checks out your source into a directory, and runs each step’s container with that directory mounted. There is no persistent build agent; every build starts clean. This is why a build is reproducible and why anything you want to keep (artefacts, caches) must be pushed somewhere durable before the worker is destroyed.

/workspace is the shared volume that carries state between steps. The worker mounts a single directory, /workspace, into every step at the same path, and it is the step’s working directory by default. Your source is checked out there. Whatever step 1 writes to /workspace (a compiled binary, a generated file, downloaded dependencies) is visible to step 2. Anything written outside /workspace (e.g. into a step’s own container filesystem) is lost when that step’s container exits. This single fact — only /workspace persists across steps — drives most “why did my file disappear?” debugging.

A builder is just an image; you are not limited to Google’s. A builder is the image a step runs. Three flavours: cloud builders (Google-maintained images like gcr.io/cloud-builders/docker, gcr.io/cloud-builders/gcloud, gcr.io/cloud-builders/git), community/public images (any image on Docker Hub, Artifact Registry, etc. — node, python, golang, maven, gradle), and custom builders (an image you build yourself for your toolchain). The modern recommendation is to use official public images (node:20, python:3.12) directly rather than the older gcr.io/cloud-builders/* mirrors, except for docker, gcloud, gke-deploy, and similar Google-specific tooling.

Substitutions are build-time variables. A cloudbuild.yaml can reference variables with $VAR or ${VAR}. Built-in substitutions ($PROJECT_ID, $BUILD_ID, $COMMIT_SHA, $SHORT_SHA, $BRANCH_NAME, $TAG_NAME, $LOCATION, …) are filled by Cloud Build. User-defined substitutions (which must start with _, e.g. $_REGION) are values you supply on the trigger or the command line. This is how one config file serves many environments — the file is static, the substitutions vary.

Cloud Deploy progresses one artefact through ordered targets; it does not build. Cloud Deploy’s unit of work is a release — an immutable snapshot of what to deploy (your rendered manifests plus the image references). You create a release once; you then promote it through a delivery pipeline, which is an ordered list of targets (each target = one environment, e.g. a specific GKE cluster or Cloud Run service+region). Promoting creates a rollout to the next target. Cloud Deploy never builds your image — it consumes an image that Cloud Build (or anything else) already produced. Build once, deploy the same artefact everywhere is the entire philosophy, and it is why “it worked in staging but broke in prod” largely disappears: staging and prod deploy the byte-identical artefact.

Skaffold is the rendering and deploying engine inside Cloud Deploy. Cloud Deploy does not invent its own manifest format; it drives Skaffold (Google’s open-source build/render/deploy tool). At release time Cloud Deploy runs skaffold render to turn your templates into concrete, per-target manifests (substituting the image, the namespace, etc.) and stores them; at rollout time it runs skaffold apply/deploy to push those exact rendered manifests to the target. Knowing “Cloud Deploy = managed Skaffold + a promotion state machine + approvals” demystifies the whole product.

Identity is a recurring theme in both. A Cloud Build build runs as a service account (historically the legacy Cloud Build SA PROJECT_NUMBER@cloudbuild.gserviceaccount.com; today you should specify a user-managed service account). Cloud Deploy uses its own service account for orchestration and an execution service account per target for the actual deploy. Getting these identities and their IAM right is the single most common source of CI/CD failures. Key terms throughout: step, builder, /workspace, substitution, artifact, trigger, pool (Cloud Build); delivery pipeline, target, release, rollout, promotion, phase, strategy (Cloud Deploy).


Part 1 — Cloud Build (CI)

The build config: cloudbuild.yaml top-level fields

A build is defined by a build config, written as YAML (cloudbuild.yaml) or JSON (cloudbuild.json). Here is the full top-level shape; every field is explained below.

steps:                       # required: the ordered list of build steps
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', 'us-central1-docker.pkg.dev/$PROJECT_ID/repo/app:$SHORT_SHA', '.']
substitutions:               # user-defined variables (must start with _)
  _REGION: us-central1
images:                      # images to push to a registry on success
  - 'us-central1-docker.pkg.dev/$PROJECT_ID/repo/app:$SHORT_SHA'
artifacts:                   # non-image artefacts to upload (GCS, Maven, npm, Python, Go)
  objects:
    location: 'gs://$PROJECT_ID-artifacts/'
    paths: ['bin/*']
options:                     # build-wide options (machine type, logging, pool, env, ...)
  machineType: 'E2_HIGHCPU_8'
  logging: CLOUD_LOGGING_ONLY
  dynamicSubstitutions: true
timeout: '1200s'             # whole-build timeout (default 10 min = 600s; max 24h)
tags: ['ci', 'backend']      # build tags for filtering
serviceAccount: 'projects/$PROJECT_ID/serviceAccounts/builder@$PROJECT_ID.iam.gserviceaccount.com'
availableSecrets:            # Secret Manager secrets exposed to steps
  secretManager:
    - versionName: projects/$PROJECT_ID/secrets/MY_SECRET/versions/latest
      env: 'MY_SECRET'
Top-level field What it is Default Notes / gotcha
steps Ordered list of build steps (the only required field) Each needs a name (the builder image); execution is sequential unless waitFor is used
substitutions User-defined variables, keys must start with _ none Override at trigger/CLI; built-in subs ($PROJECT_ID etc.) need no declaration
images Container images to push to a registry after all steps succeed none Lets Cloud Build push (and record provenance) so you don’t need a docker push step
artifacts Non-image outputs: GCS objects, Maven/npm/Python packages, Go modules none Uploaded on success; npmPackages, pythonPackages, mavenArtifacts, goModules, objects
options Build-wide settings: machineType, diskSizeGb, logging, pool, env, secretEnv, substitutionOption, dynamicSubstitutions, automapSubstitutions, requestedVerifyOption, defaultLogsBucketBehavior platform defaults See machine-type, logging, and substitution tables below
timeout Whole-build timeout 600s (10 min) Max 24h; format like 1200s. Build is failed/cancelled when exceeded
tags Free-text labels for filtering builds none Use for gcloud builds list --filter
serviceAccount The user-managed SA the build runs as legacy Cloud Build SA (being phased out) Strongly recommended to set explicitly; see IAM section
availableSecrets Secret Manager secrets bound to env vars/files none Modern secret path (replaces the KMS secretEnv path)
logsBucket A GCS bucket for build logs Google-managed bucket Set for retention/region control; SA needs write access
queueTtl How long a build may sit queued before failing 3600s Builds queue when you hit concurrency limits

Steps and builders: every field

A step is the atom of a build. The fields you can set on each step:

Step field What it is Example / note
name The builder image to run (required) 'gcr.io/cloud-builders/docker', 'node:20', 'golang:1.22', a custom image
args Arguments passed to the image’s entrypoint ['build', '-t', 'img', '.']
entrypoint Override the image’s entrypoint entrypoint: 'bash' then args: ['-c', 'npm ci && npm test']
env Environment variables for this step (KEY=VALUE) ['NODE_ENV=production']
secretEnv Names of secret env vars (from availableSecrets) to expose ['MY_SECRET']
dir Working directory relative to /workspace dir: 'backend' runs the step in /workspace/backend
id A name for the step, referenced by waitFor id: 'build'
waitFor Step ids this step waits on (controls ordering/parallelism) waitFor: ['-'] = start immediately; ['build'] = wait for build
timeout Per-step timeout timeout: '300s' — independent of build timeout
volumes Named volumes mounted across steps (beyond /workspace) persist e.g. a Go module cache between steps in one build
allowFailure Continue the build even if this step’s exit code is non-zero allowFailure: true
allowExitCodes Treat specific non-zero exit codes as success allowExitCodes: [1]
script Inline shell script (alternative to entrypoint+args) script: | then shell lines; auto-uses bash
automapSubstitutions Auto-expose substitutions as env vars in this step true/false

Builder choices:

Builder type Examples When to use Gotcha
Cloud builders (Google) gcr.io/cloud-builders/docker, /gcloud, /git, /gsutil, /kubectl, /gke-deploy Docker, gcloud, Git, and GCP-specific tooling Some are pinned to older tool versions; for languages prefer official images
Official public images node:20, python:3.12, golang:1.22, maven:3.9-eclipse-temurin-21, gradle:8 Language builds and tests Pulled each build unless cached; pin a tag, avoid latest
Community builders Images in the GoogleCloudPlatform/cloud-builders-community repo Tools without an official image (e.g. helm, packer, terraform) You build/host them yourself once into your own registry
Custom builders An image you build with your exact toolchain Heavy/proprietary toolchains, to cut per-build install time You maintain and version it; keep it small

The classic Docker build-and-push pattern:

steps:
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', '${_IMG}:$SHORT_SHA', '-t', '${_IMG}:latest', '.']
  - name: 'gcr.io/cloud-builders/docker'
    args: ['push', '--all-tags', '${_IMG}']
substitutions:
  _IMG: 'us-central1-docker.pkg.dev/$PROJECT_ID/repo/app'

The /workspace volume and data flow

Every step shares /workspace. Concretely:

A worked example — Go build cache shared between two steps via a named volume, with the compiled binary handed forward in /workspace:

steps:
  - name: 'golang:1.22'
    id: deps
    entrypoint: 'bash'
    args: ['-c', 'go mod download']
    volumes: [{name: 'gocache', path: '/go/pkg/mod'}]
  - name: 'golang:1.22'
    id: build
    entrypoint: 'bash'
    args: ['-c', 'CGO_ENABLED=0 go build -o /workspace/bin/app ./...']
    volumes: [{name: 'gocache', path: '/go/pkg/mod'}]

Here /go/pkg/mod (outside /workspace) only persists because of the named gocache volume; /workspace/bin/app persists automatically.

Substitutions: built-in, user-defined, and the options

Built-in substitutions (filled by Cloud Build — a selection of the important ones):

Substitution Meaning Available when
$PROJECT_ID / $PROJECT_NUMBER The build’s project id / number always
$BUILD_ID Unique id of this build always
$LOCATION / $_REGION Region of the build (regional builds) always ($LOCATION)
$COMMIT_SHA / $SHORT_SHA Full / 7-char commit hash repo-triggered builds
$BRANCH_NAME Branch that triggered the build branch/push triggers
$TAG_NAME Git tag that triggered the build tag triggers
$REPO_NAME / $REPO_FULL_NAME Repository name repo-triggered builds
$REVISION_ID Commit id (alias of $COMMIT_SHA) repo-triggered builds
$TRIGGER_NAME / $TRIGGER_BUILD_CONFIG_PATH Trigger metadata trigger-started builds
$_PR_NUMBER / $_HEAD_BRANCH / $_BASE_BRANCH Pull-request metadata pull-request triggers

$_REGION is not built-in — it is a common user-defined convention; the built-in for region is $LOCATION.

User-defined substitutions must start with _ (e.g. _REGION, _IMG, _ENV). Declare a default in substitutions: and override per trigger or with --substitutions _REGION=europe-west1 on the CLI.

Substitution options (under options:) change parsing behaviour:

Option What it does Default When to set
substitutionOption: ALLOW_LOOSE Don’t fail the build on missing/unused substitutions MUST_MATCH (strict) Temporary/looser configs; prefer strict in production
dynamicSubstitutions: true Enable bash-style parameter expansion in values (e.g. ${_A:-default}, nesting) false (auto-true for trigger-based builds) When you need defaults/derived values inside the YAML
automapSubstitutions: true Expose all substitutions to every step as env vars automatically false Avoids repeating env: per step; can leak unexpected vars

Escape a literal dollar sign with $$.

Artifacts: images and everything else

Two ways to publish outputs:

  1. images: — list container images; Cloud Build pushes them after a successful build and records build provenance. Cleaner than a manual docker push step.
  2. artifacts: — for non-image outputs. Sub-blocks:
artifacts sub-block Publishes to Example use
objects A Cloud Storage bucket (location + paths) Compiled binaries, zips, reports
mavenArtifacts An Artifact Registry Maven repo Java libraries (.jar/.pom)
npmPackages An Artifact Registry npm repo Node packages
pythonPackages An Artifact Registry Python repo Python wheels/sdists
goModules An Artifact Registry Go repo Go modules
artifacts:
  objects:
    location: 'gs://$PROJECT_ID-build-artifacts/$BUILD_ID/'
    paths: ['bin/app', 'reports/*.xml']
  pythonPackages:
    - repository: 'https://us-central1-python.pkg.dev/$PROJECT_ID/py-repo'
      paths: ['dist/*.whl']

The build service account needs write access to each destination (e.g. roles/storage.objectAdmin on the bucket, roles/artifactregistry.writer on the repo).

Machine types, disk, timeouts, and parallelism

Machine types (set under options.machineType) control build speed and cost:

machineType vCPU / RAM (approx) When to use Cost note
(unset) default 1 vCPU / ~4 GB (e2-medium class) Light builds; covered by free tier Cheapest; the free 2,500 build-min/month are at this size
E2_HIGHCPU_8 8 vCPU Faster compiles, parallel test suites Billed at a higher per-minute rate
E2_HIGHCPU_32 32 vCPU Large monorepos, heavy parallelism Highest E2 rate
E2_MEDIUM 1 vCPU Explicit small default
N1_HIGHCPU_8 / N1_HIGHCPU_32 8 / 32 vCPU Legacy N1 family equivalents Slightly different pricing than E2

Notes: bigger machines finish faster but cost more per minute — the trade is usually worth it for compile-bound builds; non-default machine types are not covered by the free tier. Private pools can additionally use larger/custom machine types.

Disk: options.diskSizeGb sets the worker disk (default 100 GB; increase for large checkouts, big images, or lots of layers — max into the hundreds of GB depending on pool).

Timeouts: the build-wide timeout defaults to 600s (10 min) and maxes at 24h; each step can also set its own timeout. A build that exceeds its timeout is terminated and marked failed. Set generous build timeouts for long integration tests but keep per-step timeouts tight to fail fast.

Parallel and sequential execution with id + waitFor:

steps:
  - name: 'node:20'
    id: lint
    entrypoint: bash
    args: ['-c', 'npm ci && npm run lint']
    waitFor: ['-']            # parallel
  - name: 'node:20'
    id: test
    entrypoint: bash
    args: ['-c', 'npm ci && npm test']
    waitFor: ['-']            # parallel with lint
  - name: 'gcr.io/cloud-builders/docker'
    id: image
    args: ['build', '-t', '${_IMG}:$SHORT_SHA', '.']
    waitFor: ['lint', 'test'] # only after both pass

Triggers: every type and the repo connection behind them

A trigger starts a build automatically in response to an event. You attach it to a connected repository (or a webhook/Pub/Sub source) and point it at a build config (cloudbuild.yaml) or an inline build.

Trigger type Fires on Key config When to use
Push to branch Commits pushed to branches matching a regex --branch-pattern (e.g. ^main$) CI on main / release branches
Push to tag A Git tag matching a regex is pushed --tag-pattern (e.g. ^v.*) Release builds on version tags
Pull request PR opened/updated against matching base branch --pull-request-pattern, comment-control Pre-merge checks; exposes $_PR_NUMBER
Manual You run it on demand gcloud builds triggers run Ad-hoc/parameterised builds
Webhook An inbound HTTP POST (any system) --webhook-config, a secret Trigger from tools without a native integration
Pub/Sub A message on a Pub/Sub topic --pubsub-topic Event-driven builds (e.g. on new Artifact Registry image, on schedule via Scheduler→Pub/Sub)
Manual (Cloud Scheduler) Cron via Scheduler → trigger Scheduler job hitting the trigger Nightly/periodic builds

Repository connections (2nd gen — the modern way): Cloud Build connects to GitHub, GitHub Enterprise, GitLab (and self-managed GitLab), and Bitbucket through the Developer Connect / 2nd-gen repository integration, which uses a Secret Manager-stored token and supports many repos per connection. The older 1st-gen GitHub App connection and Cloud Source Repositories still work but 2nd-gen is recommended for new setups. For pull-request triggers you also choose comment control — whether external contributors’ PRs auto-build or require an /gcbrun owner comment first (a security control against malicious PRs).

Create a push trigger (2nd-gen connection assumed):

gcloud builds triggers create github \
  --name=app-ci-main \
  --region=us-central1 \
  --repository=projects/PROJECT/locations/us-central1/connections/CONN/repositories/REPO \
  --branch-pattern='^main$' \
  --build-config=cloudbuild.yaml \
  --substitutions=_REGION=us-central1

Run a build by hand (no trigger needed):

gcloud builds submit --region=us-central1 \
  --config=cloudbuild.yaml \
  --substitutions=_IMG=us-central1-docker.pkg.dev/$PROJECT/repo/app .

The build service account and IAM

This is the highest-yield section for avoiding failures.

Which identity runs the build? Historically every build ran as the legacy Cloud Build service account PROJECT_NUMBER@cloudbuild.gserviceaccount.com, which had broad default roles. Google is phasing this out; new projects should set a user-managed service account on the build (the serviceAccount field, or --service-account on a trigger/submit) and grant it only what it needs. Regional builds and private pools generally require a user-managed SA.

Roles to use Cloud Build / start builds:

Role Grants Give to
roles/cloudbuild.builds.editor Create/cancel builds, manage triggers Engineers/CI
roles/cloudbuild.builds.viewer Read builds and logs Auditors/read-only
roles/cloudbuild.builds.approver Approve builds awaiting approval Release approvers
roles/cloudbuild.connectionAdmin Manage repo connections Platform admins

Roles the build’s service account typically needs (least-privilege, per use):

Role Why
roles/artifactregistry.writer Push images/packages to Artifact Registry
roles/logging.logWriter Write build logs (required when using a user-managed SA + CLOUD_LOGGING_ONLY)
roles/storage.objectAdmin Write artefacts / logs to a GCS bucket
roles/secretmanager.secretAccessor Read secrets exposed to the build
roles/clouddeploy.releaser Create a Cloud Deploy release as the last build step
roles/container.developer Deploy to GKE directly from a build (if not using Cloud Deploy)
roles/run.developer + roles/iam.serviceAccountUser Deploy to Cloud Run directly from a build

The classic “permission denied” gotcha: when you switch to a user-managed SA, you must explicitly grant roles/logging.logWriter (or set options.logging), or the build fails immediately on log setup. And to act as the build SA, the principal/service creating the build needs roles/iam.serviceAccountUser on it.

Pools: default pool vs private pool

A pool is the worker infrastructure your steps run on.

Aspect Default pool Private pool
What it is Google-managed shared workers on the public internet Dedicated, isolated workers in a Google-managed VPC you peer to
Network reach Public internet only (no VPC access) Reaches your VPC via VPC peering → private resources (private GKE, internal DBs, private Artifact Registry)
Egress IP Dynamic/shared Can be made static (NAT) for allowlisting; or no public egress at all
Machine types Standard set Standard plus larger/custom machine types and bigger disks
Concurrency / quotas Shared limits Higher, configurable concurrency
Setup None Create a worker-pool (region, machine type, network peering)
Cost Build-minute pricing incl. free tier Build-minute pricing at private-pool rates (no free tier); pay for the isolation
When to use Public builds, simplest case Builds that must reach private resources, need static egress, VPC-SC perimeters, or bigger machines

Create a private pool peered to a VPC and use it:

gcloud builds worker-pools create my-pool \
  --region=us-central1 \
  --peered-network=projects/PROJECT/global/networks/my-vpc \
  --worker-machine-type=e2-standard-4 --worker-disk-size=100 \
  --no-public-egress           # workers have no public IP (private egress only)

# reference it in options:
#   options:
#     pool:
#       name: projects/PROJECT/locations/us-central1/workerPools/my-pool

Gotcha: a private pool that needs to pull public base images while having --no-public-egress requires Cloud NAT or a private mirror in your VPC; otherwise docker pull node:20 fails.

Secrets and caching

Secrets — the modern Secret Manager path (availableSecrets + secretEnv):

availableSecrets:
  secretManager:
    - versionName: projects/$PROJECT_ID/secrets/NPM_TOKEN/versions/latest
      env: 'NPM_TOKEN'
steps:
  - name: 'node:20'
    entrypoint: bash
    args: ['-c', 'echo "//registry.npmjs.org/:_authToken=$$NPM_TOKEN" > ~/.npmrc && npm ci']
    secretEnv: ['NPM_TOKEN']

Note $$NPM_TOKEN (double dollar) so the shell — not the substitution engine — expands it, and the build SA needs roles/secretmanager.secretAccessor on the secret. The legacy KMS path (secrets: with a KMS-encrypted kmsKeyName and ciphertext) still works but Secret Manager is preferred.

Secret method How Status
Secret Manager (availableSecrets) Reference a secret version, bind to secretEnv/file Recommended
KMS-encrypted (secrets:) Encrypt with Cloud KMS, store ciphertext, decrypt at build Legacy
Plain env / baked into image Hard-coded Never — leaks into logs/layers

Caching strategies (Cloud Build has no persistent cache between builds by default, so you arrange your own):

Strategy How it works Best for Gotcha
--cache-from Pull the previous image and let Docker reuse layers Docker builds with stable lower layers Must docker pull the cache image first; depends on layer ordering
Kaniko cache Build with gcr.io/kaniko-project/executor, caching layers in Artifact Registry Daemonless builds, fine-grained layer cache Different flags than docker build; cache repo must exist
Cloud Storage cache Tar your deps cache to GCS at end of build, restore at start Language deps (node_modules, ~/.m2, Go mod) You script save/restore; watch staleness
Buildpacks/pack Buildpack layer caching Source-only builds (gcloud run deploy --source) Less control than a Dockerfile
Kaniko + --cache-ttl TTL on cached layers Long-lived caches Stale-cache bugs if TTL too long

Kaniko example:

steps:
  - name: 'gcr.io/kaniko-project/executor:latest'
    args:
      - '--destination=${_IMG}:$SHORT_SHA'
      - '--cache=true'
      - '--cache-ttl=168h'

Part 2 — Cloud Deploy (CD)

The delivery pipeline and targets

Cloud Deploy is configured declaratively in a clouddeploy.yaml containing a DeliveryPipeline and one or more Target resources. The pipeline lists targets in order; that order is the promotion path.

apiVersion: deploy.cloud.google.com/v1
kind: DeliveryPipeline
metadata:
  name: app-pipeline
serialPipeline:
  stages:
    - targetId: dev
    - targetId: staging
    - targetId: prod
      strategy:
        standard:
          verify: true
---
apiVersion: deploy.cloud.google.com/v1
kind: Target
metadata:
  name: dev
gke:
  cluster: projects/PROJECT/locations/us-central1/clusters/dev-cluster
---
apiVersion: deploy.cloud.google.com/v1
kind: Target
metadata:
  name: prod
requireApproval: true
gke:
  cluster: projects/PROJECT/locations/us-central1/clusters/prod-cluster

Apply this with gcloud deploy apply --file=clouddeploy.yaml --region=us-central1.

DeliveryPipeline fields:

Field What it is Note
serialPipeline.stages Ordered list of targetIds (the promotion path) The spine of CD; promotion always moves to the next stage
stages[].strategy Per-stage rollout strategy (standard or canary) Default is standard (all-at-once)
stages[].profiles Skaffold profiles to activate for that stage How per-env differences are rendered
stages[].deployParameters Key/values passed to rendering for that stage Per-target manifest values

Target types and fields:

Target kind Deploys to Key fields
gke A GKE Standard/Autopilot cluster cluster (full path); optional internalIp, proxyUrl
run Cloud Run location (projects/.../locations/REGION)
anthosCluster Anthos/registered cluster membership (Connect gateway)
multiTarget Fan-out to several child targets at once targetIds: [a, b] (parallel deploy)
customTarget A custom target type (your own deployer) customTargetType reference

Common Target fields (any kind):

Field What it is Default When
requireApproval Rollouts to this target wait for manual approval false Gate prod (and often staging)
executionConfigs Per-target render/deploy execution settings (SA, worker pool, timeouts, artifact storage) Cloud Deploy defaults Pin the execution service account, use a private pool, set timeouts
deployParameters Target-scoped rendering parameters none Per-environment values (replicas, hostnames)
labels / annotations Metadata none Org tagging

Skaffold: render vs deploy

Cloud Deploy drives Skaffold. You provide a skaffold.yaml describing how to render and deploy your manifests; Cloud Deploy supplies the image(s) and the per-target context.

apiVersion: skaffold/v4beta11
kind: Config
manifests:
  rawYaml:
    - k8s/deployment.yaml
    - k8s/service.yaml
deploy:
  kubectl: {}
profiles:
  - name: prod
    manifests:
      rawYaml: [k8s/deployment.yaml, k8s/prod-overlay.yaml]

Two phases:

Renderers/deployers Skaffold supports inside Cloud Deploy: raw YAML + kubectl, Helm, Kustomize, and for Cloud Run a Cloud Run manifest (service.yaml). This is why the same pipeline can target both GKE (kubectl/Helm/Kustomize) and Cloud Run.

Releases, rollouts, and promotion

The lifecycle in three nouns:

# Create a release (renders to every target; deploys to the FIRST stage)
gcloud deploy releases create rel-$SHORT_SHA \
  --delivery-pipeline=app-pipeline --region=us-central1 \
  --images=app=us-central1-docker.pkg.dev/$PROJECT/repo/app:$SHORT_SHA

# Promote it from dev -> staging -> prod (one hop per command)
gcloud deploy releases promote --release=rel-$SHORT_SHA \
  --delivery-pipeline=app-pipeline --region=us-central1

# Approve a rollout that is waiting on a requireApproval target
gcloud deploy rollouts approve ROLLOUT_NAME \
  --release=rel-$SHORT_SHA --delivery-pipeline=app-pipeline \
  --to-target=prod --region=us-central1

The --images=NAME=IMAGE flag maps the placeholder image name in your Skaffold/manifests to the concrete, immutable image (pin to a digest in production). You can pass --to-target to create a release that targets a specific stage, and --disable-initial-rollout to render without deploying yet.

Approvals

Set requireApproval: true on a target and every rollout to it pauses in a Pending Approval state until someone with roles/clouddeploy.approver runs gcloud deploy rollouts approve (or clicks Approve in the console). You can reject instead. This is the human gate before production. Approvals integrate with notifications: Cloud Deploy publishes events to Pub/Sub (rollout/approval/release notifications), so you can route an approval request to Slack/email and even drive automated approvals via automation rules (below).

Deployment strategies: standard, canary, and the hooks

The strategy on a stage controls how the rollout reaches 100% on that target.

Strategy Behaviour When
standard Deploy to 100% in one phase (optionally with verify/predeploy/postdeploy) dev/staging, or low-risk prod
canary Roll out in phases by percentage (e.g. 25% → 50% → 100%), pausing between phases for verification/approval Risk-managed prod releases

A canary with custom percentages and hooks:

serialPipeline:
  stages:
    - targetId: prod
      strategy:
        canary:
          runtimeConfig:
            kubernetes:
              serviceNetworking:
                service: app-svc
                deployment: app
          canaryDeployment:
            percentages: [25, 50]      # then implicit 100
            verify: true               # run skaffold `verify` after each phase
            predeploy:
              actions: ['warmup']      # custom pre-deploy action
            postdeploy:
              actions: ['notify']      # custom post-deploy action
Canary field What it does
canaryDeployment.percentages The traffic percentages per phase (final 100 is implicit)
customCanaryDeployment.phaseConfigs Fully custom phases (different percentages, profiles, verify per phase)
verify: true Run the Skaffold verify profile (smoke tests) after a phase before proceeding
predeploy.actions / postdeploy.actions Named Skaffold custom actions run before/after the deploy of a phase
runtimeConfig.kubernetes (gatewayServiceMesh / serviceNetworking) How canary traffic is split on GKE (Gateway API mesh vs Service-based)
runtimeConfig.cloudRun (automaticTrafficControl, canaryRevisionTags) How canary traffic is split on Cloud Run (revision traffic %)

For Cloud Run targets, canary uses revision traffic splitting; for GKE, it uses either a Service-based split or the Gateway API service mesh, depending on runtimeConfig.

Rollback, multi-target, and automation

Rollback — one command redeploys a previous, already-rendered release to a target (no rebuild, because the old release’s rendered manifests are stored):

gcloud deploy targets rollback prod \
  --delivery-pipeline=app-pipeline --region=us-central1
# (optionally --release=PREVIOUS_RELEASE --rollout-id=...)

Multi-target deploys to several child targets in parallel from one pipeline stage (e.g. deploy to three regional clusters at once) by pointing a stage at a multiTarget whose targetIds list the children. Useful for fan-out to many clusters/regions.

Automation rules (Automation resource) let Cloud Deploy act without a human: auto-promote a release to the next stage after a wait or on success, auto-advance canary phases, auto-repair a failed/stalled rollout (retry/rollback), and timed promotions. This is how you build a hands-off pipeline while keeping requireApproval on the final gate.

apiVersion: deploy.cloud.google.com/v1
kind: Automation
metadata:
  name: app-pipeline/auto-promote
selector:
  targets: [{ id: dev }]
rules:
  - promoteReleaseRule:
      id: promote-to-staging
      wait: 10m          # bake in dev for 10 min, then auto-promote

The build → Artifact Registry → deploy chain

The end-to-end native pipeline ties Part 1 and Part 2 together:

  1. git push to the connected repo fires a Cloud Build trigger.
  2. Cloud Build builds, tests, and pushes the image to Artifact Registry (images: or docker push), pinned by $SHORT_SHA/digest.
  3. A final Cloud Build step creates a Cloud Deploy release (gcloud deploy releases create … --images=app=…@sha256:…), with the build SA holding roles/clouddeploy.releaser.
  4. Cloud Deploy renders per target and deploys to dev, then waits for promotion/approval up the chain to staging and prod, optionally as a canary.
  5. Each deploy can be gated by Binary Authorization so only attested images run.

The “release from a build” final step:

  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
    entrypoint: gcloud
    args:
      - deploy
      - releases
      - create
      - rel-$SHORT_SHA
      - '--delivery-pipeline=app-pipeline'
      - '--region=us-central1'
      - '--images=app=us-central1-docker.pkg.dev/$PROJECT_ID/repo/app:$SHORT_SHA'

Binary Authorization

Binary Authorization is a deploy-time admission control: it lets you require that any image deployed to GKE or Cloud Run carries cryptographic attestations (signatures) proving it came from your trusted pipeline (e.g. was built by Cloud Build and passed your vulnerability gate). You define a policy (default rule + per-cluster/per-target rules) listing the attestors whose signatures are required; an image with no valid attestation is blocked (or logged, in dry-run). Cloud Build can produce build provenance and attestations automatically (SLSA build level), and Cloud Deploy honours the target’s Binary Authorization policy at rollout. The result: a supply-chain guarantee that only images built and signed by your pipeline reach production — a frequent PCDE/PCSE exam topic. Pair this with immutable tags and digest pinning in Artifact Registry for end-to-end integrity.

Google Cloud Build and Cloud Deploy pipeline: triggers to build to Artifact Registry to delivery pipeline targets

The diagram traces the full path — a Git event hitting a Cloud Build trigger, the build running steps on a pool and pushing to Artifact Registry, and Cloud Deploy promoting the resulting release through dev → staging → prod targets with approvals, canary, and Binary Authorization gating each rollout.

Hands-on lab

We will build a container with Cloud Build, push it to Artifact Registry, then model a tiny Cloud Deploy pipeline (single Cloud Run target) and run a release. The Cloud Build free tier (2,500 build-minutes/month on the default machine) plus the $300 free-trial credit covers this comfortably; Cloud Deploy has no per-pipeline charge (you pay for the underlying GKE/Cloud Run and any build minutes).

1. Set project/region and enable the APIs.

gcloud config set project YOUR_PROJECT_ID
REGION=us-central1
gcloud services enable cloudbuild.googleapis.com artifactregistry.googleapis.com \
  clouddeploy.googleapis.com run.googleapis.com

2. Create an Artifact Registry Docker repo (the build’s push target):

gcloud artifacts repositories create demo-repo \
  --repository-format=docker --location=$REGION

3. Write a minimal app + cloudbuild.yaml. Create a Dockerfile:

cat > Dockerfile <<'EOF'
FROM nginx:1.27-alpine
RUN echo "hello from cloud build + cloud deploy" > /usr/share/nginx/html/index.html
EOF

And a cloudbuild.yaml:

cat > cloudbuild.yaml <<'EOF'
steps:
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', '${_IMG}:latest', '.']
images: ['${_IMG}:latest']
substitutions:
  _IMG: 'us-central1-docker.pkg.dev/${PROJECT_ID}/demo-repo/web'
options:
  logging: CLOUD_LOGGING_ONLY
EOF

4. Run the build (manual submit):

gcloud builds submit --region=$REGION --config=cloudbuild.yaml .

Expected output: step logs ending with PUSH of the image and a SUCCESS status. Confirm the image landed:

gcloud artifacts docker images list us-central1-docker.pkg.dev/$(gcloud config get-value project)/demo-repo/web

5. Model a Cloud Deploy pipeline with one Cloud Run target. Skaffold config:

cat > skaffold.yaml <<'EOF'
apiVersion: skaffold/v4beta11
kind: Config
manifests:
  rawYaml: [service.yaml]
deploy:
  cloudrun: {}
EOF
cat > service.yaml <<'EOF'
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: deploy-demo
spec:
  template:
    spec:
      containers:
        - image: web        # placeholder, replaced by --images
EOF
cat > clouddeploy.yaml <<EOF
apiVersion: deploy.cloud.google.com/v1
kind: DeliveryPipeline
metadata: {name: demo-pipeline}
serialPipeline:
  stages: [{targetId: prod}]
---
apiVersion: deploy.cloud.google.com/v1
kind: Target
metadata: {name: prod}
run:
  location: projects/$(gcloud config get-value project)/locations/$REGION
EOF
gcloud deploy apply --file=clouddeploy.yaml --region=$REGION

6. Create a release (renders + deploys to the prod Cloud Run target):

IMG=us-central1-docker.pkg.dev/$(gcloud config get-value project)/demo-repo/web:latest
gcloud deploy releases create rel-001 \
  --delivery-pipeline=demo-pipeline --region=$REGION \
  --images=web=$IMG

7. Validate. Watch the rollout succeed, then hit the Cloud Run URL:

gcloud deploy rollouts list --release=rel-001 \
  --delivery-pipeline=demo-pipeline --region=$REGION \
  --format="value(name, state)"
URL=$(gcloud run services describe deploy-demo --region=$REGION --format='value(status.url)')
curl -s "$URL"     # expect: hello from cloud build + cloud deploy

8. Cleanup (delete everything to stop charges):

gcloud run services delete deploy-demo --region=$REGION --quiet
gcloud deploy delivery-pipelines delete demo-pipeline --region=$REGION --force --quiet
gcloud artifacts repositories delete demo-repo --location=$REGION --quiet

Cost note. Cloud Build’s free tier covers 2,500 build-minutes/month on the default machine type — this lab uses a handful. Larger machineTypes and private pools are billed per build-minute and are not free. Cloud Deploy itself has no resource charge; you pay only for the targets (this Cloud Run service scales to zero and is effectively free idle) and any build minutes the release rendering uses. Artifact Registry charges for stored image GB (negligible here). Deleting the resources above returns you to zero.

Common mistakes & troubleshooting

Symptom Likely cause Fix
Build fails instantly with a logging/permission error User-managed build SA lacks roles/logging.logWriter Grant logging.logWriter, or set options.logging: CLOUD_LOGGING_ONLY/GCS_ONLY
denied: Permission "artifactregistry.repositories.uploadArtifacts" on push Build SA missing roles/artifactregistry.writer on the repo Grant artifactregistry.writer to the build SA on that repo/project
A file written in one step is gone in the next It was written outside /workspace Write to /workspace, or declare a named volumes: entry on both steps
$MY_VAR came out empty / build failed on “unused substitution” Strict substitution matching (MUST_MATCH) Declare the sub, fix the name, or set substitutionOption: ALLOW_LOOSE
Secret value appears blank in the step Used $MY_SECRET (single $) so the sub engine ate it Use $$MY_SECRET (double $) and list it in secretEnv
Private-pool build can’t docker pull a public base image --no-public-egress with no NAT/mirror Add Cloud NAT or a private mirror in the peered VPC
Cloud Deploy release “create” denied from a build Build SA lacks roles/clouddeploy.releaser (and SA-user on the deploy execution SA) Grant clouddeploy.releaser; ensure execution SA permissions
Rollout stuck “Pending Approval” forever Target has requireApproval: true gcloud deploy rollouts approve … (needs roles/clouddeploy.approver)
Rollout fails: image blocked Binary Authorization policy requires an attestation the image lacks Attest the image in the pipeline, or fix the policy/attestor
Canary never advances past phase 1 verify: true step failing, or no traffic-split runtimeConfig Fix the verify profile; configure serviceNetworking/Gateway for GKE or cloudRun traffic

Best practices

Security notes

Interview & exam questions

  1. What is the difference between Cloud Build and Cloud Deploy? Cloud Build is CI — it runs build/test/package steps in containers and pushes artefacts to a registry. Cloud Deploy is CD — it takes a built artefact and progresses it through ordered environments with promotion, approvals, canary, and rollback. Build produces the artefact; Deploy ships it. Cloud Deploy never builds.
  2. What is /workspace and why does it matter? It is the single directory mounted into every build step at the same path; it is where source is checked out and the only thing that persists between steps. Anything written outside /workspace (in a step’s own container) is lost when that step ends — the cause of most “my file vanished” bugs.
  3. Built-in vs user-defined substitutions? Built-in subs ($PROJECT_ID, $BUILD_ID, $COMMIT_SHA, $SHORT_SHA, $BRANCH_NAME, $TAG_NAME, …) are filled by Cloud Build. User-defined subs must start with _ (e.g. $_REGION) and are supplied on the trigger or CLI. One static config serves many environments via subs.
  4. How do you run build steps in parallel? Give steps an id and set waitFor: ['-'] to start them immediately (in parallel); use waitFor: ['stepA','stepB'] to make a step wait for specific others. Default (no waitFor) is sequential file order.
  5. Default pool vs private pool — when each? The default pool runs on Google-managed public workers (simplest, free-tier eligible) but cannot reach your VPC. A private pool runs isolated workers peered to your VPC — use it when builds must reach private resources (private GKE, internal DBs), need static egress for allowlisting, sit in a VPC-SC perimeter, or need bigger machines. Private-pool minutes aren’t free.
  6. How do you give a build a secret safely? Declare it under availableSecrets.secretManager (a Secret Manager version), expose it to a step via secretEnv, reference it as $$SECRET (double dollar) in shell, and grant the build SA roles/secretmanager.secretAccessor. Never bake secrets into env/layers.
  7. What is the legacy Cloud Build service account issue? Builds historically ran as PROJECT_NUMBER@cloudbuild.gserviceaccount.com with broad default roles; Google is phasing it out. New builds should set a user-managed SA with least privilege — and you must then explicitly grant logging.logWriter or builds fail on log setup.
  8. Explain release, rollout, and promotion in Cloud Deploy. A release is the immutable thing to ship (image + rendered manifests), created once. A rollout is deploying that release to one target. Promotion moves the release to the next target in the pipeline’s ordered stages, creating the next rollout. Render once, promote the same artefact up the chain.
  9. What role does Skaffold play in Cloud Deploy? Cloud Deploy drives Skaffold: at release time it runs skaffold render (per target, substituting the image/profiles/parameters) and stores the rendered manifests; at rollout time it runs skaffold apply to deploy those exact manifests. Supports raw YAML+kubectl, Helm, Kustomize, and Cloud Run manifests.
  10. How does a canary rollout work, and how does traffic split per platform? A canary strategy rolls out in phases by percentage (e.g. 25→50→100), pausing for verify/approval between phases. On GKE traffic is split via a Service or the Gateway API mesh (runtimeConfig.kubernetes); on Cloud Run via revision traffic percentages (runtimeConfig.cloudRun).
  11. How do you roll back a bad deploy? gcloud deploy targets rollback TARGET … redeploys a previous, already-rendered release to that target — no rebuild, because the prior release’s manifests are stored. Instant and deterministic.
  12. How does Binary Authorization fit the CI/CD chain? It is deploy-time admission control: a policy requires images to carry valid attestations from trusted attestors (e.g. proof they were built by Cloud Build and passed scanning). Unattested images are blocked at GKE/Cloud Run rollout, guaranteeing only pipeline-built, signed images reach production.

Quick check

  1. Which directory is shared across all Cloud Build steps and persists between them?
  2. What must every user-defined substitution name start with?
  3. Which waitFor value makes a step start immediately so it runs in parallel?
  4. In Cloud Deploy, what action moves a release from its current target to the next target?
  5. Which Cloud Deploy strategy rolls a release out in percentage phases with pauses for verification?

Answers

  1. /workspace — mounted into every step at the same path; anything written there (and only there) survives into later steps.
  2. An underscore _ (e.g. _REGION, _IMG); built-in subs like $PROJECT_ID need no declaration.
  3. waitFor: ['-'] — “wait for nothing”, so the step starts immediately, in parallel with other ['-'] steps.
  4. Promotion (gcloud deploy releases promote) — it creates a rollout to the next stage in serialPipeline.stages.
  5. The canary strategy (strategy.canary with canaryDeployment.percentages), pausing between phases for verify/approval.

Exercise

Build the full native chain end to end. Using gcloud: (a) create an Artifact Registry Docker repo and a dedicated user-managed service account for builds, granting it only roles/artifactregistry.writer, roles/logging.logWriter, and roles/clouddeploy.releaser; (b) write a cloudbuild.yaml that runs a parallel lint and test step (waitFor: ['-']), then a Docker build/push step (waitFor both), pulls one value from Secret Manager via availableSecrets, and as a final step creates a Cloud Deploy release; © create a delivery pipeline with three targets dev → staging → prod, where prod has requireApproval: true and a canary [25, 50] strategy with verify: true; (d) wire a push trigger on ^main$ to that build config using a 2nd-gen repo connection and a user-managed SA; (e) push a commit, watch the build run and the release deploy to dev, promote to staging, then approve the prod rollout; (f) roll back prod to the prior release; then (g) delete the pipeline, repo, trigger, and service account. In a sentence each, explain why you used a dedicated build SA rather than the legacy one, and why prod uses canary + approval while dev does not.

Certification mapping

Glossary

Next steps

You can now drive both halves of GCP-native CI/CD — Cloud Build’s config, triggers, substitutions, pools, identity, secrets, and caching, and Cloud Deploy’s pipelines, targets, releases, promotion, approvals, canary, and rollback, all chained through Artifact Registry and gated by Binary Authorization. Make sure the registry side is solid by reading the Artifact Registry deep dive — repositories, formats, scanning, and cleanup policies are the supply-chain foundation this pipeline pushes to. Then, for the keyless way to let external CI (GitHub Actions, GitLab CI) authenticate to GCP without service-account keys — the alternative to building inside Cloud Build — read Workload Identity Federation for keyless CI/CD. After that, continue into the money side of running all this with the Google Cloud Billing & Cost Management deep dive.

gcpcloud-buildcloud-deployci-cddevopsPCDE
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments