DevOps Multi-Cloud

Instrumenting DORA Metrics: Building a Deployment Frequency and Lead-Time Pipeline

Most DORA dashboards lie. They count GitHub merges as deploys, treat every closed Jira bug as a failure, and aggregate forty services into one meaningless company-wide number. The result is a chart leadership stares at while nobody’s delivery actually improves. DORA metrics are only useful when each value maps to a real, timestamped event in a system you control, and when the aggregation matches how you actually ship.

This guide builds that pipeline. We collect deployment and change events from your VCS and CI/CD, derive lead time from first commit to production, compute change failure rate and MTTR from incident and rollback signals, store everything in a queryable table, and surface it in Grafana without creating metrics people game. The reference stack is GitHub plus a generic CI runner plus PostgreSQL plus Grafana, but every event boundary maps cleanly to GitLab, Azure DevOps, Jenkins, PagerDuty, or Opsgenie.

1. Define the four keys precisely (and the pitfalls)

The four DORA metrics, per the Accelerate research and the annual DORA report, split into throughput and stability:

Metric Definition Event boundary
Deployment frequency How often you deploy to production One row per successful prod deploy
Lead time for changes Time from code committed to code running in prod first_commit_ts to deployed_ts
Change failure rate Percentage of deployments causing a failure in prod failed deploys / total deploys
Mean time to restore (MTTR)* Time to recover from a prod failure incident_start_ts to incident_resolved_ts

*The 2024 DORA report renamed MTTR to “failed deployment recovery time” to clarify it measures recovery from a deployment-induced failure, not from any incident. The boundary matters: a failed disk in a datacenter is reliability, not delivery. Scope MTTR to incidents linked to a deployment.

The pitfalls that produce fiction:

2. Identify the source events

Map each metric to a concrete, timestamped signal before writing any code. For the reference stack:

Event Source Signal
Change created GitHub push / merge to default branch; first commit SHA + author date
Build / pipeline run CI pipeline start and finish with commit SHA
Deployment CI deploy job explicit “deploy succeeded/failed to prod” emission
Incident opened PagerDuty / Opsgenie incident triggered with a service field
Incident resolved PagerDuty / Opsgenie incident resolved timestamp
Rollback CI / GitOps a deploy whose is_rollback=true

The single most important design decision: the deployment event is emitted by the deploy job itself, not inferred. Inference (e.g. “a merge to main means a deploy happened”) breaks the moment you batch, gate, or have a failed promotion. Make the pipeline say “I deployed service X, version Y, to prod, at time T, and it succeeded.”

3. Collect deployment and change events

Two ingestion paths feed one normalized table: webhooks for things that happen in SaaS (merges, incidents) and an explicit CI annotation for deploys.

First, the schema. One wide event table is enough to start; you derive metrics with SQL.

CREATE TABLE delivery_events (
  id            BIGSERIAL PRIMARY KEY,
  event_type    TEXT NOT NULL,          -- change | deploy | incident_open | incident_resolved
  service       TEXT NOT NULL,
  team          TEXT,
  environment   TEXT,                   -- prod | staging | ...
  commit_sha    TEXT,
  first_commit_ts TIMESTAMPTZ,          -- earliest commit in the change set
  occurred_at   TIMESTAMPTZ NOT NULL,   -- when the event itself happened
  success       BOOLEAN,                -- deploy outcome
  is_rollback   BOOLEAN DEFAULT FALSE,
  incident_id   TEXT,                   -- correlate open/resolved
  source        TEXT NOT NULL,          -- github | ci | pagerduty
  payload       JSONB,                  -- raw event for auditing
  ingested_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_events_service_time ON delivery_events (service, occurred_at);
CREATE INDEX idx_events_type ON delivery_events (event_type, environment);

A minimal receiver. Verify the webhook signature before trusting anything; GitHub signs with HMAC-SHA256 over the raw body using your webhook secret.

import hashlib, hmac, os, json
from datetime import datetime, timezone
from flask import Flask, request, abort
import psycopg

app = Flask(__name__)
SECRET = os.environ["GH_WEBHOOK_SECRET"].encode()

def verify(req) -> bool:
    sig = req.headers.get("X-Hub-Signature-256", "")
    mac = hmac.new(SECRET, req.get_data(), hashlib.sha256)
    expected = "sha256=" + mac.hexdigest()
    return hmac.compare_digest(expected, sig)

@app.post("/hooks/github")
def github():
    if not verify(request):
        abort(401)
    if request.headers.get("X-GitHub-Event") != "push":
        return "", 204
    e = request.json
    if e.get("ref") != f"refs/heads/{e['repository']['default_branch']}":
        return "", 204  # only default-branch changes count

    commits = e.get("commits", [])
    if not commits:
        return "", 204
    # earliest commit timestamp in the pushed set = lead-time start
    first_ts = min(c["timestamp"] for c in commits)

    with psycopg.connect(os.environ["DATABASE_URL"]) as conn:
        conn.execute(
            """INSERT INTO delivery_events
               (event_type, service, commit_sha, first_commit_ts,
                occurred_at, source, payload)
               VALUES ('change', %s, %s, %s, %s, 'github', %s)""",
            (e["repository"]["name"], e["after"], first_ts,
             datetime.now(timezone.utc), json.dumps(e)),
        )
    return "", 202

Now the deploy event. Emit it from the pipeline at the end of the prod deploy step. This is a plain authenticated POST; the CI job already knows the version, environment, and outcome.

# Runs as the last step of the prod deploy job.
# DEPLOY_OK is "true"/"false" set by the preceding deploy step.
FIRST_COMMIT_TS=$(git log --reverse --format=%cI "origin/main..HEAD" | head -1)
[ -z "$FIRST_COMMIT_TS" ] && FIRST_COMMIT_TS=$(git show -s --format=%cI HEAD)

curl -fsS -X POST "$METRICS_INGEST_URL/events/deploy" \
  -H "Authorization: Bearer $METRICS_TOKEN" \
  -H "Content-Type: application/json" \
  -d @- <<JSON
{
  "service":        "$SERVICE_NAME",
  "team":           "$TEAM_NAME",
  "environment":    "prod",
  "commit_sha":     "$(git rev-parse HEAD)",
  "first_commit_ts":"$FIRST_COMMIT_TS",
  "occurred_at":    "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
  "success":        $DEPLOY_OK,
  "is_rollback":    ${IS_ROLLBACK:-false}
}
JSON

The git log origin/main..HEAD trick gives you the first commit unique to this change set, which is the correct lead-time start. For a merge-commit workflow, walk the merged commits instead via the GitHub compare API.

Make the metrics POST non-blocking for the deploy: wrap it so a metrics outage never fails a production release. Use curl -fsS ... || echo "metrics emit failed" and alert on the gap separately. Delivery instrumentation must never be in the critical path of delivery.

4. Compute lead time for changes

Lead time is the gap between first_commit_ts and the prod deploy that carried that commit. Because the deploy event records its own first_commit_ts (from the CI snippet above), the computation is local to the deploy row, no join required:

SELECT
  service,
  date_trunc('week', occurred_at)                       AS week,
  percentile_cont(0.5) WITHIN GROUP (
    ORDER BY EXTRACT(EPOCH FROM (occurred_at - first_commit_ts))
  ) / 3600.0                                             AS lead_time_p50_hours,
  percentile_cont(0.9) WITHIN GROUP (
    ORDER BY EXTRACT(EPOCH FROM (occurred_at - first_commit_ts))
  ) / 3600.0                                             AS lead_time_p90_hours,
  count(*)                                               AS deploys
FROM delivery_events
WHERE event_type = 'deploy'
  AND environment = 'prod'
  AND success = TRUE
  AND first_commit_ts IS NOT NULL
GROUP BY service, week
ORDER BY week DESC;

Two correctness notes. Report p50 and p90, not the mean, because lead-time distributions are heavily right-skewed. And guard against clock skew producing negative intervals: if a committer’s machine has a future clock, occurred_at - first_commit_ts goes negative. Filter WHERE occurred_at >= first_commit_ts or clamp at zero, and log the rejects so you can chase the bad runner.

Deployment frequency falls out of the same table trivially:

SELECT service,
       date_trunc('day', occurred_at) AS day,
       count(*) AS prod_deploys
FROM delivery_events
WHERE event_type = 'deploy' AND environment = 'prod' AND success = TRUE
GROUP BY service, day
ORDER BY day DESC;

5. Derive change failure rate and MTTR

Change failure rate is failed prod deploys over all prod deploys. Two failure signals count: a deploy that reports success=false, and a subsequent is_rollback=true deploy (a rollback is evidence the prior release failed).

WITH prod AS (
  SELECT * FROM delivery_events
  WHERE event_type = 'deploy' AND environment = 'prod'
)
SELECT
  service,
  date_trunc('week', occurred_at) AS week,
  count(*) FILTER (WHERE success = FALSE OR is_rollback)::numeric
    / NULLIF(count(*), 0) AS change_failure_rate,
  count(*) AS total_deploys
FROM prod
GROUP BY service, week
ORDER BY week DESC;

MTTR comes from correlating incident_open and incident_resolved events on incident_id. Scope to incidents tied to a service that deploys, so you measure deployment recovery rather than general reliability:

WITH incidents AS (
  SELECT
    service,
    incident_id,
    min(occurred_at) FILTER (WHERE event_type = 'incident_open')     AS opened,
    min(occurred_at) FILTER (WHERE event_type = 'incident_resolved') AS resolved
  FROM delivery_events
  WHERE event_type IN ('incident_open', 'incident_resolved')
  GROUP BY service, incident_id
)
SELECT
  service,
  date_trunc('week', opened) AS week,
  percentile_cont(0.5) WITHIN GROUP (
    ORDER BY EXTRACT(EPOCH FROM (resolved - opened))
  ) / 60.0 AS mttr_p50_minutes
FROM incidents
WHERE resolved IS NOT NULL
GROUP BY service, week
ORDER BY week DESC;

For the incident source, normalize the PagerDuty webhook into the same table. PagerDuty v3 webhooks deliver an envelope with event.event_type values like incident.triggered and incident.resolved:

@app.post("/hooks/pagerduty")
def pagerduty():
    e = request.json["event"]
    etype = {"incident.triggered": "incident_open",
             "incident.resolved":  "incident_resolved"}.get(e["event_type"])
    if not etype:
        return "", 204
    data = e["data"]
    with psycopg.connect(os.environ["DATABASE_URL"]) as conn:
        conn.execute(
            """INSERT INTO delivery_events
               (event_type, service, environment, occurred_at,
                incident_id, source, payload)
               VALUES (%s, %s, 'prod', %s, %s, 'pagerduty', %s)""",
            (etype, data["service"]["summary"], e["occurred_at"],
             data["id"], json.dumps(e)),
        )
    return "", 202

6. Model per-service vs. per-team aggregation and bands

DORA bands are defined per delivery unit, and the right unit depends on your org. A team owning one service: roll up by team. A platform team owning twenty services: per-service is the unit that drives action, with team as a secondary slice. Keep both service and team on every row (as the schema does) so you can aggregate either way without re-ingesting.

The published DORA performance bands (2024 report) are a useful, if coarse, yardstick:

Metric Elite High Medium Low
Deployment frequency On-demand (multiple/day) Weekly to monthly Monthly to every 6 months Fewer than every 6 months
Lead time for changes Less than one day One day to one week One week to one month One to six months
Change failure rate 5 percent 10 percent 15 percent 64 percent (low cluster)
Failed deployment recovery Less than one hour Less than one day One day to one week More than six months

Encode the bands once so dashboards and reports agree:

CREATE OR REPLACE FUNCTION lead_time_band(hours numeric)
RETURNS text LANGUAGE sql IMMUTABLE AS $$
  SELECT CASE
    WHEN hours < 24            THEN 'Elite'
    WHEN hours < 24 * 7        THEN 'High'
    WHEN hours < 24 * 30       THEN 'Medium'
    ELSE 'Low'
  END;
$$;

Resist averaging metrics across services into one org number. A platform team’s true picture is “16 of 20 services are Elite on lead time, 4 are Medium and here’s why,” not “the org is High.” The per-service breakdown is what you act on.

7. Build dashboards in Grafana (and avoid gaming)

Point Grafana at the PostgreSQL data source and back each panel with the SQL above. Use Grafana’s macros so panels respect the dashboard time range and the $service template variable:

-- Grafana panel: weekly lead time p50 per selected service
SELECT
  $__timeGroup(occurred_at, '1w')                       AS time,
  percentile_cont(0.5) WITHIN GROUP (
    ORDER BY EXTRACT(EPOCH FROM (occurred_at - first_commit_ts))
  ) / 3600.0                                            AS "lead_time_p50_hours"
FROM delivery_events
WHERE event_type = 'deploy' AND environment = 'prod' AND success
  AND service IN ($service)
  AND $__timeFilter(occurred_at)
GROUP BY 1
ORDER BY 1;

Define the $service variable as a query: SELECT DISTINCT service FROM delivery_events ORDER BY 1. The $__timeFilter and $__timeGroup macros bind to the dashboard controls automatically.

On gaming: every delivery metric has a perverse optimum. The defenses are structural, not motivational:

Verify

Confirm the pipeline produces correct, trustworthy numbers before anyone reads a dashboard.

# 1. Webhook signature path works (expect 202 for valid, 401 for tampered).
BODY='{"ref":"refs/heads/main","after":"abc123",
       "repository":{"name":"checkout","default_branch":"main"},
       "commits":[{"id":"abc123","timestamp":"2026-06-08T09:00:00Z"}]}'
SIG="sha256=$(printf '%s' "$BODY" | openssl dgst -sha256 -hmac "$GH_WEBHOOK_SECRET" | awk '{print $2}')"
curl -i -X POST localhost:8080/hooks/github \
  -H "X-GitHub-Event: push" -H "X-Hub-Signature-256: $SIG" \
  -H "Content-Type: application/json" -d "$BODY"
-- 2. No negative lead times leaked in (clock-skew guard working).
SELECT count(*) AS bad_rows
FROM delivery_events
WHERE event_type = 'deploy' AND occurred_at < first_commit_ts;
-- expect 0

-- 3. CFR denominator includes successes (not just failures).
SELECT count(*) FILTER (WHERE success)      AS ok,
       count(*) FILTER (WHERE NOT success)  AS failed,
       count(*) FILTER (WHERE is_rollback)  AS rollbacks
FROM delivery_events
WHERE event_type = 'deploy' AND environment = 'prod';

-- 4. Every resolved incident has a matching open (correlation intact).
SELECT incident_id FROM delivery_events
WHERE event_type = 'incident_resolved'
  AND incident_id NOT IN (
    SELECT incident_id FROM delivery_events WHERE event_type = 'incident_open');
-- expect 0 rows

Then sanity-check one known release by hand: pick a recent prod deploy, find its first commit in git log, and confirm the dashboard’s lead time for that week matches the wall-clock gap. If it doesn’t, the bug is in the event, not the query.

Enterprise scenario

A platform team at a payments company ran twenty-six microservices behind Argo CD. Their first DORA dashboard counted GitHub merges as deployments, and it reported the whole org as “Elite” with sub-hour lead times. Leadership loved it; the on-call engineers knew it was nonsense. The gap: a merge to main only updated a Helm values file in the GitOps repo. The actual rollout happened minutes-to-hours later when Argo CD synced and the canary analysis passed, and a non-trivial fraction of syncs were rolled back by the analysis gate. Merges massively under-counted lead time and completely missed failures.

The constraint was that the deploy outcome lived in Argo CD, not in CI, and the team couldn’t bolt a curl into a sync the way they could into a pipeline job. They solved it by treating Argo CD as the deployment event source: a Kubernetes controller watched Rollout and Application resources and emitted a deploy event only when a rollout reached Healthy in prod, with success=false when it reached Degraded and is_rollback=true when Argo CD aborted to the previous revision. The first_commit_ts came from the image tag’s commit SHA, resolved against the app repo.

# Argo CD notifications trigger -> POST a real deploy event on rollout health.
template.deploy-event: |
  webhook.metrics:
    method: POST
    path: /events/deploy
    body: |
      {
        "service": "{{.app.metadata.labels.service}}",
        "team": "{{.app.metadata.labels.team}}",
        "environment": "prod",
        "commit_sha": "{{.app.status.sync.revision}}",
        "occurred_at": "{{.app.status.operationState.finishedAt}}",
        "success": {{eq .app.status.health.status "Healthy"}}
      }
trigger.on-deployed: |
  - when: app.status.health.status in ['Healthy', 'Degraded']
    send: [deploy-event]

Once the event came from the system that actually performs the deploy, the dashboard told a true and far less flattering story: median lead time was nine hours, change failure rate was eleven percent, and two services dragged the rest. Those two became the next quarter’s focus, and lead time on them halved within two months. Same data warehouse, same Grafana; the fix was moving the event boundary to the thing that ships.

Checklist

dora-metricsrelease-engineeringobservabilityci-cdplatform-engineering

Comments

Keep Reading