DevOps Multi-Cloud

Migrating to Trunk-Based Development: Branching Policy, Feature Flags, and Merge Hygiene

GitFlow taught a generation of teams to fear main. Release branches, develop branches, hotfix branches, and weeks-old feature branches all promise safety and instead deliver merge hell, integration that happens too late to fix cheaply, and a main nobody trusts to ship. Trunk-based development (TBD) inverts the model: everyone integrates small changes into one shared branch many times a day, incomplete work hides behind flags, and main stays releasable at every commit. This guide is the migration path, not the manifesto. We will define the branching policy, hide unfinished work, decouple deploy from release, keep the build green, and prove with metrics that lead time actually dropped.

1. Why long-lived branches break continuous delivery

The core failure of GitFlow is deferred integration. A feature branch that lives two weeks accumulates conflicts against every other branch merged in that window. The cost of a merge conflict grows superlinearly with branch age and the number of concurrent branches, because each branch must reconcile against the union of all changes since it forked. Teams respond by merging less often, which makes each merge bigger and more dangerous, a vicious cycle.

The second failure is that develop and main diverge. develop accumulates work that is “done” but not released; main reflects production. The delta between them is unreleased risk you cannot see. When you finally cut a release branch, you are integrating a batch of weeks-old changes, and the bugs you find are the most expensive kind: late, batched, and hard to bisect.

Trunk-based development fixes both by construction. Branches live hours, not weeks, so conflicts stay trivial. There is one trunk (main), so there is no divergence to reconcile. The price you pay is discipline: you must decompose work into small mergeable increments and hide anything not ready behind a flag. That trade is almost always worth it, and the rest of this article is how to pay it cleanly.

The DORA research is unambiguous here: teams with fewer than three active branches, branches living less than a day, and no code freezes have materially higher software delivery performance. TBD is not a style preference; it is correlated with the outcomes you are being measured on.

2. Defining the branching policy

Write the policy down. Ambiguity is what killed your last “we should merge more often” initiative. A workable TBD policy has four rules.

  1. One protected trunk. main is the only long-lived branch. No develop, no permanent release/*.
  2. Short-lived branches. A branch exists for one small change and is deleted on merge. Target under a day; hard-cap at a few days.
  3. Small PRs. Cap diff size so review is fast and conflicts are rare. A few hundred changed lines is a sane ceiling; flag the outliers, do not block them outright.
  4. Serialized integration. All merges go through a merge queue that re-tests against the current tip of main before landing.

Encode the non-negotiables in branch protection so policy does not depend on memory. With the GitHub CLI:

gh api --method PUT \
  repos/acme/payments-api/branches/main/protection \
  --input - <<'JSON'
{
  "required_status_checks": {
    "strict": true,
    "contexts": ["ci/build", "ci/unit", "ci/lint"]
  },
  "enforce_admins": true,
  "required_pull_request_reviews": {
    "required_approving_review_count": 1,
    "dismiss_stale_reviews": true
  },
  "required_linear_history": true,
  "allow_force_pushes": false,
  "allow_deletions": false,
  "restrictions": null
}
JSON

required_linear_history plus squash-merge gives you a readable trunk where every commit is a complete, reviewed change you can revert atomically. That property is what makes “always releasable” enforceable later.

For PR size, do not rely on reviewer goodwill. Add a CI gate that warns past a threshold so large PRs are a deliberate, visible choice:

# .github/workflows/pr-size.yml
name: pr-size
on: pull_request
jobs:
  size:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: Check diff size
        run: |
          BASE="origin/${{ github.base_ref }}"
          git fetch origin "${{ github.base_ref }}" --depth=1
          CHANGED=$(git diff --shortstat "$BASE"...HEAD \
            | grep -oE '[0-9]+ insertion|[0-9]+ deletion' \
            | grep -oE '[0-9]+' | paste -sd+ - | bc)
          CHANGED=${CHANGED:-0}
          echo "Changed lines: $CHANGED"
          if [ "$CHANGED" -gt 600 ]; then
            echo "::warning::PR changes $CHANGED lines; consider splitting (>600)."
          fi

The merge queue is the load-bearing piece. GitHub’s native merge queue takes each approved PR, rebases it onto the current main, runs the required checks against that combination, and only then fast-forwards. This kills the classic TBD hazard where two PRs each pass CI in isolation but break when combined.

gh api --method PATCH repos/acme/payments-api \
  -f allow_merge_commit=false \
  -f allow_squash_merge=true \
  -f allow_rebase_merge=false
# Then enable "Require merge queue" in the branch protection ruleset UI,
# and require the merge_group event check to pass.

Make CI run on the merge_group event so the queue actually re-validates the combined result:

on:
  pull_request:
  merge_group:

3. Hiding incomplete work behind feature flags

Short-lived branches only work if you can merge unfinished work safely. The mechanism is the feature flag: code reaches main and ships to production, but the new path stays dark until you turn it on. The flag check should be a single, cheap, centralized call, not scattered booleans.

// flags.ts -- thin wrapper over your provider (OpenFeature-compatible)
import { OpenFeature } from "@openfeature/server-sdk";

const client = OpenFeature.getClient();

export async function isEnabled(
  flag: string,
  ctx: { userId?: string; tenantId?: string } = {},
): Promise<boolean> {
  return client.getBooleanValue(flag, false, ctx);
}

Note the default is false. A flag that defaults open is a flag that ships unfinished work the moment your flag service has a blip. Dark-by-default is the only safe posture for release flags.

In the request path the incomplete feature is a branch on the flag:

if (await isEnabled("checkout-v2", { tenantId })) {
  return checkoutV2(cart);
}
return checkoutV1(cart);

For changes too large to wrap in a single if – swapping a payments gateway, replacing a persistence layer – use branch-by-abstraction instead of a long-lived branch. Introduce an interface, route current traffic through the old implementation behind it, then build the new implementation incrementally on trunk. Every step is a small, green, merged PR.

interface PaymentGateway {
  charge(amount: Money, token: string): Promise<ChargeResult>;
}

// Step 1: wrap the existing code, no behavior change.
class LegacyGateway implements PaymentGateway { /* current impl */ }

// Step 2..N: build the new one over several PRs, fully tested,
//            never reached in prod until the flag flips.
class StripeGateway implements PaymentGateway { /* new impl */ }

function gatewayFor(ctx: Ctx): PaymentGateway {
  return ctx.flags.stripeMigration ? new StripeGateway() : new LegacyGateway();
}

The abstraction is the seam that lets a multi-week change live on trunk as a series of one-day changes. When the new implementation is proven, you delete the flag and the legacy class. The interface can stay or go.

4. Decoupling deploy from release

The mindset shift that makes TBD safe: deploy (push a binary to an environment) is not release (expose behavior to users). Once flags gate behavior, you deploy main continuously and release independently by flipping flags. A deploy carrying dark code is low-risk because nothing user-visible changed.

This is what lets you merge to main and deploy a dozen times a day without a dozen risky launches. The pipeline ships the artifact; the flag service governs exposure. Progressive rollout becomes a percentage on the flag, not a branching exercise:

# Release to 5% of tenants without any deploy.
curl -sS -X PATCH "$FLAGS_API/flags/checkout-v2" \
  -H "Authorization: Bearer $FLAGS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"rollout": {"strategy": "percentage", "value": 5}}'

Kill switch and rollout share one control plane, so reverting a bad release is flipping a flag in seconds, not reverting a commit and waiting for a build. That separation is the entire point: it decouples the speed of integration from the risk of exposure.

5. Keeping the build green with pre-merge checks

In TBD, a red main blocks everyone, so protecting trunk’s greenness is the highest-leverage investment. Two disciplines do most of the work: fast, mandatory pre-merge checks, and the serialized merge queue from step 2.

Keep the pre-merge suite fast (target under ten minutes) or developers will batch changes to avoid the wait, defeating the model. Shard tests to hit the budget:

jobs:
  unit:
    strategy:
      matrix:
        shard: [1, 2, 3, 4]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: "20"
          cache: "npm"
      - run: npm ci
      - run: npx jest --shard=${{ matrix.shard }}/4 --ci

The queue does the rest: because each PR is tested against the live tip of main before it lands, “passed in isolation, broke on merge” cannot reach trunk. If the queue build fails, that PR is ejected and the others proceed, so one bad change does not stall the line.

When main does break – it will – the team rule is stop the line: no new merges until trunk is green. Fix forward with a tiny PR or revert the offending commit. Because every merge was a small atomic squash, git revert of a single SHA cleanly removes one change:

git revert --no-edit <sha>
git push origin HEAD:refs/heads/revert-<sha>
gh pr create --fill --base main

6. Handling database and API changes under trunk-based flow

Schema and contract changes are where naive TBD bites, because main ships continuously and you cannot ship a migration that the currently-running code cannot tolerate. The discipline is expand/contract (a.k.a. parallel change): never make a breaking change in one step.

Take renaming users.fullname to users.full_name:

Expand. Add the new column; keep the old one. Backfill. Deploys in this window run code that still reads the old column, and that is fine.

ALTER TABLE users ADD COLUMN full_name text;
UPDATE users SET full_name = fullname WHERE full_name IS NULL;

Migrate. Ship code that writes both columns and reads the new one, gated by a flag so you can roll it back instantly. Several small PRs, all green on trunk.

Contract. Once every running instance uses full_name and the flag is fully on, drop the old column in a later deploy.

ALTER TABLE users DROP COLUMN fullname;

Each phase is independently deployable and backward-compatible with the code running beside it, which is exactly the invariant TBD requires. The same pattern governs API evolution: add v2 fields additively, dual-write, migrate consumers, then remove v1 only after telemetry shows it is unused. Never couple a breaking schema change to the deploy that depends on it; the gap between deploy and full rollout is where rollbacks live.

For long backfills on large tables, do them in batches outside the request path so a migration never holds a lock long enough to stall trunk’s deploys:

-- run repeatedly until 0 rows affected
UPDATE users SET full_name = fullname
WHERE full_name IS NULL
LIMIT 5000;

7. Retiring stale flags and preventing flag debt

Feature flags are debt with a coupon. A release flag that has been at 100% for a month is now dead branches, untested fallback code, and conditional complexity that confuses every future reader. TBD without flag hygiene rots into a different mess than GitFlow, but a mess all the same.

Make flag debt visible and time-boxed:

A simple linter catches expired temporary flags in CI:

# fail the build if any temporary flag is past its remove-by date
jq -r '.flags[]
  | select(.temporary == true)
  | select(.removeBy < (now | strftime("%Y-%m-%d")))
  | .key' flags.json | while read -r f; do
  echo "::error::Flag '$f' is past its removeBy date; remove it."
  FAIL=1
done
[ -z "$FAIL" ] || exit 1

Removing a flag is itself a small trunk-based change: delete the if, delete the now-unreachable branch, delete the flag definition, ship. Treating cleanup as ordinary work, not a someday-project, is what keeps the flag count bounded.

8. Rollout sequencing and metrics to prove it worked

Do not flip the whole org to TBD on a Monday. Sequence it:

  1. Pilot one team that already ships frequently. Stand up the merge queue, flag SDK, and size gate for their repo only.
  2. Stabilize the trunk discipline – green-build culture, stop-the-line, flag hygiene – before scaling.
  3. Template the setup (branch protection, queue config, flag wrapper) so the next team adopts in an afternoon.
  4. Roll out team by team, retiring develop and release/* branches as each migrates.

Prove the migration with the DORA four, tracked before and after. Lead time for changes (commit to production) and deployment frequency should improve as branches shrink; change-failure rate and time-to-restore tell you it did not improve at the cost of stability. If your CI tags deploys, you can compute lead time straight from git and your deploy log – here, the p50 hours from authorship to deploy:

git log --since="30 days ago" --pretty="%H %aI" main | while read -r sha authored; do
  deployed=$(grep "$sha" deploys.log | awk '{print $2}')
  [ -z "$deployed" ] && continue
  python3 -c "import sys,datetime as d; a=d.datetime.fromisoformat('$authored'); \
    p=d.datetime.fromisoformat('$deployed'); print((p-a).total_seconds()/3600)"
done | sort -n | awk '{v[NR]=$1} END{print "p50 lead hours:", v[int(NR/2)]}'

Enterprise scenario

A payments platform team I worked with ran 40 microservices on GitFlow with a weekly release train. Their constraint was hard: PCI-DSS required a documented, auditable change approval on everything reaching the cardholder-data environment, and their security team read “auditable” as “long-lived release branch with a sign-off.” That belief was the real blocker, and it was making their lead time worse, not their compliance better.

We kept the audit requirement and dropped the long-lived branch. The merge queue became the control point. Branch protection mandated one approving review and all required checks, and GitHub’s API exposes the reviewer, the commit, and the merge timestamp – a complete, immutable approval record per change. We piped the pull_request review and merge events into the SIEM, so every change to the regulated service had a queryable approval trail without a release branch existing at all. The auditors accepted per-commit review records as stronger evidence than a batched branch sign-off, because each control mapped to exactly one change.

The seam that made it safe was deploy/release separation. New behavior shipped dark and was released by flipping a flag, and the flag-change API was itself logged as an audited control. A release became a flag flip with its own approval record, decoupled from the continuous deploys of dark code.

# enforce per-change approval the auditors accept, in the ruleset
required_pull_request_reviews:
  required_approving_review_count: 1
  dismiss_stale_reviews: true
  require_code_owner_reviews: true   # CODEOWNERS gates the regulated paths

Lead time for changes to the regulated services dropped from roughly seven days to under a day within a quarter, change-failure rate fell because batches shrank, and the audit posture got stronger because evidence moved from coarse branch sign-offs to fine-grained per-commit records.

Verify

Confirm the migration is real, not aspirational:

Checklist

trunk-based-developmentfeature-flagsbranching-strategyci-cdrelease-engineering

Comments

Keep Reading