Shell Lesson 27 of 42

Shell Idempotency Patterns: State Files, Reconciliation Loops, Dry-Run Flags & Idempotent Primitives

Why Idempotency Matters: The Five Failure Modes of Non-Idempotent Scripts

You wrote a setup script. It worked. You run it again to apply a small config tweak — and it fails because the user already exists, the cron entry got duplicated, the symlink threw File exists, or the database migration ran twice and corrupted a counter. You add 2>/dev/null || true to suppress the errors, ship it, and now the script “succeeds” whether or not it actually did anything.

This is the most common operational tax in shell automation. Idempotency is the property that running a script N times produces the same end state as running it once. It is not a coding style; it is a contract with the runtime: “I describe what should be true. The script makes it true if it isn’t, and does nothing if it already is.”

The five failure modes you eliminate by building idempotency in from the start:

Failure mode Symptom Real-world consequence
Duplicate appends Cron has the same line three times after three runs Job runs three times concurrently, locks fight, alerts fire
First-run-only logic useradd succeeds once, fails forever after Deploy pipeline goes red on the second run; team disables the check
Mid-run partial state Script crashes between step 4 and step 5; rerunning fails because step 4 state is “already there” Manual cleanup tickets every time CI flakes
Hidden drift Config file matches manifest “by accident” — no one knows what controls it Security audit finds /etc/sudoers.d/admin no one remembers writing
Re-run dread Engineers refuse to re-run the script in prod because “last time it broke things” Scripts become single-use; institutional knowledge dies; everything is manual

The fix is structural, not cosmetic. Idempotency is built from four primitives that this lesson covers in depth:

  1. State-file markers — write a sentinel after every “did the work” branch so reruns see “already done.”
  2. Desired-vs-actual reconciliation — express what should be true, query what is true, apply only the delta.
  3. --dry-run discipline — every change-making function must support a no-op mode that prints what it would do.
  4. Idempotent primitives — use ln -sfn, install -m, mkdir -p, sed -i with sentinels, and atomic file writes instead of their non-idempotent equivalents.

By the end, you’ll have a lib/state.sh you can source into any script to make it convergent.

Ground Rules: What “Idempotent” Actually Means in Shell

Before we write code, let’s pin down the definitions, because “idempotent” gets used loosely.

Strict idempotency

f(state) = f(f(state)) = f(f(f(state)))

The script is a function from system state to system state. Applying it twice equals applying it once. The script reads the world, computes the delta, and applies the minimum set of changes.

Convergent idempotency (what we usually build)

f(state) → desired_state, regardless of starting state

Slightly weaker but more practical: the script doesn’t care what state it started from; it reaches desired_state whether the system was empty, partially configured, or fully configured. Configuration management tools (Ansible, Puppet, Chef) are convergent.

What idempotency is NOT

The mental model: every change-making function is a state transition

Every function that changes the system answers four questions:

  1. What is the desired state? (The argument: ensure_user "deploy" "deploy" "/home/deploy".)
  2. What is the current state? (Read it: id deploy 2>/dev/null.)
  3. Is current == desired? (Compare; if yes, return without changing anything.)
  4. If not, apply the minimal change to transition. (Run useradd or usermod.)

This four-step pattern is the heart of every idempotent function in this lesson.

The Four Pillars: State Markers, Reconciliation, Dry-Run, Primitives

Pillar 1 — State-file markers (the “I already did this” sentinel)

The simplest idempotency primitive: write a marker file when you complete a one-shot, expensive, or non-queryable action. On rerun, check for the marker and skip if present.

#!/usr/bin/env bash
set -Eeuo pipefail

STATE_DIR="/var/lib/myapp/state"
mkdir -p "$STATE_DIR"

# One-shot: download a 2 GB model file. We can check the file's existence,
# but we also want to record WHEN and WHICH version was installed, so a
# marker with metadata is more useful than just `[[ -f /path/to/file ]]`.

ensure_model_v2() {
  local marker="$STATE_DIR/model-v2.installed"
  if [[ -f "$marker" ]]; then
    log "model-v2 already installed at $(cat "$marker")"
    return 0
  fi
  log "downloading model-v2..."
  curl -fsSL https://example.com/model-v2.bin -o /opt/myapp/model.bin
  echo "$(date -u +%FT%TZ) sha256=$(sha256sum /opt/myapp/model.bin | awk '{print $1}')" >"$marker"
  log "model-v2 installed"
}

Why this works:

State-file rules:

  1. Write the marker last. Never write the marker before the work succeeds; otherwise a crash mid-work leaves a marker that lies about state.
  2. Include version/identity in the marker name. model-v2.installed, not model.installed — the v2 marker won’t satisfy a future “ensure v3” check.
  3. Keep markers under one well-known directory. /var/lib/myapp/state for system services, ~/.local/state/myapp for user scripts, $STATE_DIR for tests. Easy to clear, easy to inspect, easy to back up.
  4. Don’t put markers in /tmp. /tmp gets cleared on reboot on many distros (systemd-tmpfiles); your “already installed” claim evaporates.

Pillar 2 — Desired-vs-actual reconciliation (drift detection)

State markers work for one-shot operations. For ongoing config (a user, a file, a sysctl value, a cron entry), markers are wrong because the config could drift after the marker was written. Someone could delete the user, edit the file, override the sysctl. The script needs to read the current state on every run and reconcile.

# The reconciliation pattern, applied to a user account.

ensure_user() {
  local user="$1" home="$2" shell="${3:-/bin/bash}"
  local current_home current_shell

  if id -u "$user" &>/dev/null; then
    # Read actual state.
    current_home=$(getent passwd "$user" | cut -d: -f6)
    current_shell=$(getent passwd "$user" | cut -d: -f7)

    # Compare. If everything matches, no-op.
    if [[ "$current_home" == "$home" && "$current_shell" == "$shell" ]]; then
      log "user $user already correct"
      return 0
    fi

    # Apply the delta only.
    log "user $user exists but has drifted; applying $home, $shell"
    [[ "$current_home" != "$home" ]] && usermod --home "$home" --move-home "$user"
    [[ "$current_shell" != "$shell" ]] && usermod --shell "$shell" "$user"
  else
    log "creating user $user"
    useradd --home-dir "$home" --shell "$shell" --create-home "$user"
  fi
}

The four-step reconciliation pattern, made explicit:

┌─────────────────────────────────────────────┐
│ 1. Desired state from arguments             │  ensure_user "deploy" "/home/deploy" "/bin/bash"
└─────────────┬───────────────────────────────┘
              ▼
┌─────────────────────────────────────────────┐
│ 2. Read actual state from the system        │  getent passwd deploy
└─────────────┬───────────────────────────────┘
              ▼
┌─────────────────────────────────────────────┐
│ 3. Compute delta (desired - actual)         │  home differs? shell differs? exists?
└─────────────┬───────────────────────────────┘
              ▼
┌──────────────┴──────────────────────────────┐
▼ delta empty                  ▼ delta non-empty
"already correct"             apply minimal change
return 0                      (useradd or usermod)

This pattern scales. The same skeleton works for files (compare_file — checksum the desired vs actual, atomic-write if different), services (ensure_service_runningsystemctl is-active, then systemctl start if needed), packages (ensure_pkg_installeddpkg-query then apt-get install), and remote API state (ensure_dns_record — query the API, then PATCH only the changed fields).

Pillar 3 — Dry-run discipline (the no-op mode)

Every change-making function must support --dry-run. Two reasons:

  1. Operator safety. Before applying a script in production, the operator wants to see the diff: “what would this change?” A dry-run that prints “would create user deploy, would write /etc/cron.d/myjob, would not change /etc/hosts” is the audit log before the fact.
  2. CI integration. PR checks should run the script against a snapshot of prod state with --dry-run and fail if the diff is unexpected.

The pattern: a global DRY_RUN=0 flag, and a wrapper do_or_say that gates every mutating call.

DRY_RUN=0

# Parse args; $1 is "--dry-run" if present.
[[ "${1:-}" == "--dry-run" ]] && { DRY_RUN=1; shift; }

# Wrapper: in dry-run, print the command instead of running it.
do_or_say() {
  if (( DRY_RUN )); then
    printf 'WOULD: %s\n' "$*" >&2
  else
    "$@"
  fi
}

ensure_user_dry() {
  local user="$1" home="$2"
  if id -u "$user" &>/dev/null; then
    log "user $user exists"
  else
    do_or_say useradd --home-dir "$home" --create-home "$user"
  fi
}

Dry-run rules:

Pillar 4 — Idempotent primitives (use the right tool)

Many shell tools have idempotent and non-idempotent variants. Use the idempotent one by default, even when you “know” the script will only run once.

Non-idempotent Idempotent equivalent Why
ln -s target link ln -sfn target link -f overwrites, -n treats existing symlinks as files (doesn’t create link inside dir)
mkdir /opt/myapp mkdir -p /opt/myapp -p succeeds if dir exists; creates parents
cp file /etc/myconf install -m 0644 -o root -g root file /etc/myconf install sets perms+owner; idempotent w.r.t. mode/ownership drift
echo 'line' >> file sentinel + sed -i (see below) >> appends every run; sentinel detects existing
useradd user `id -u user
groupadd grp getent group grp >/dev/null || groupadd grp same pattern
git clone url dir [[ -d dir/.git ]] && git -C dir pull || git clone url dir clone fails if dir exists; reconcile via pull
rm /tmp/foo rm -f /tmp/foo -f doesn’t fail if missing; idempotent removal
kill $pid kill $pid 2>/dev/null || true better: check if running first

The append-only-if-missing pattern (sentinel comments):

# Add a line to a file only if it's not already there. The sentinel comment
# uniquely identifies our line so we can find it on rerun.

ensure_line_in_file() {
  local file="$1" line="$2" sentinel="$3"
  # sentinel example: "# managed-by:myapp:cron-job"
  if grep -qF "$sentinel" "$file" 2>/dev/null; then
    # Line is already there. Update it in place if content differs.
    local current
    current=$(grep -F "$sentinel" "$file" | head -n1)
    if [[ "$current" != "$line" ]]; then
      # Use a delimiter unlikely to appear in $line; '|' is common.
      sed -i.bak "/$(printf '%s' "$sentinel" | sed 's:[][\\/.^$*]:\\&:g')/c\\
$line" "$file"
    fi
  else
    printf '%s\n' "$line" >>"$file"
  fi
}

# Usage:
ensure_line_in_file /etc/cron.d/myapp \
  "*/5 * * * * deploy /opt/myapp/bin/heartbeat # managed-by:myapp:heartbeat" \
  "managed-by:myapp:heartbeat"

Atomic file writes (the mv-into-place pattern):

# Idempotent file generation. Compare desired content to current, write only on diff.

ensure_file_content() {
  local path="$1" mode="$2" owner="$3" group="$4"
  local desired_content
  desired_content=$(cat)  # read from stdin

  # Compare.
  if [[ -f "$path" ]] && diff -q <(printf '%s' "$desired_content") "$path" >/dev/null; then
    log "$path: content unchanged"
    return 0
  fi

  # Write atomically: temp file in same directory, then rename.
  local tmp
  tmp=$(mktemp "${path}.XXXXXX")
  printf '%s' "$desired_content" >"$tmp"
  install -m "$mode" -o "$owner" -g "$group" "$tmp" "$path"
  rm -f "$tmp"  # install copies; remove the temp
  log "$path: updated"
}

# Usage:
ensure_file_content /etc/myapp/server.conf 0644 root root <<'EOF'
listen_port = 8080
log_level = info
EOF

Why atomic writes: if the script crashes mid-write with >"$path", the file is truncated and the service that reads it sees garbage. Writing to ${path}.XXXXXX and renaming is atomic on POSIX filesystems — readers see either the old file or the new one, never a partial.

A Drop-In Library: lib/state.sh

Compose the patterns above into a reusable library. Source it from any idempotent script.

# lib/state.sh — drop-in idempotency helpers.
# Source from any script: . /opt/myapp/lib/state.sh

# ─── Configuration (override before sourcing if needed) ────────────────────
: "${STATE_DIR:=/var/lib/$(basename "${0##*/}" .sh)/state}"
: "${DRY_RUN:=0}"
: "${LOG_PREFIX:=$(basename "${0##*/}" .sh)}"

# ─── Logging (timestamped, structured-ish) ─────────────────────────────────
log()  { printf '[%s] [%s] [INFO]  %s\n'  "$(date -u +%FT%TZ)" "$LOG_PREFIX" "$*" >&2; }
warn() { printf '[%s] [%s] [WARN]  %s\n'  "$(date -u +%FT%TZ)" "$LOG_PREFIX" "$*" >&2; }
err()  { printf '[%s] [%s] [ERROR] %s\n'  "$(date -u +%FT%TZ)" "$LOG_PREFIX" "$*" >&2; }
chg()  { printf '[%s] [%s] [CHANGE] %s\n' "$(date -u +%FT%TZ)" "$LOG_PREFIX" "$*" >&2; }

# ─── Dry-run wrapper ───────────────────────────────────────────────────────
# Run the command, or print "WOULD: ..." in dry-run mode.
do_or_say() {
  if (( DRY_RUN )); then
    printf 'WOULD: %s\n' "$*" >&2
    return 0
  fi
  "$@"
}

# ─── Init ──────────────────────────────────────────────────────────────────
state_init() {
  if (( ! DRY_RUN )); then
    mkdir -p "$STATE_DIR"
    chmod 0700 "$STATE_DIR"
  fi
}

# ─── State markers ─────────────────────────────────────────────────────────
state_marker_path() { printf '%s/%s.marker' "$STATE_DIR" "$1"; }

state_marker_exists() { [[ -f "$(state_marker_path "$1")" ]]; }

state_marker_set() {
  local name="$1" content="${2:-$(date -u +%FT%TZ)}"
  do_or_say bash -c "umask 077; printf '%s\n' '$content' >'$(state_marker_path "$name")'"
}

state_marker_clear() {
  do_or_say rm -f "$(state_marker_path "$1")"
}

# Run $@ once-and-only-once across reruns; mark with $name.
state_once() {
  local name="$1"; shift
  if state_marker_exists "$name"; then
    log "skip ($name): already done at $(cat "$(state_marker_path "$name")" 2>/dev/null || echo unknown)"
    return 0
  fi
  log "running ($name): $*"
  if "$@"; then
    state_marker_set "$name"
    chg "completed: $name"
  else
    err "failed: $name (no marker written; will retry next run)"
    return 1
  fi
}

# ─── Idempotent primitives ─────────────────────────────────────────────────

# Ensure a directory exists with mode/owner/group.
ensure_dir() {
  local path="$1" mode="${2:-0755}" owner="${3:-root}" group="${4:-root}"
  if [[ ! -d "$path" ]]; then
    do_or_say install -d -m "$mode" -o "$owner" -g "$group" "$path"
    chg "created dir $path ($mode $owner:$group)"
  else
    # Reconcile mode/owner/group.
    local cur_mode cur_owner cur_group
    cur_mode=$(stat -c %a "$path" 2>/dev/null || stat -f %A "$path")
    cur_owner=$(stat -c %U "$path" 2>/dev/null || stat -f %Su "$path")
    cur_group=$(stat -c %G "$path" 2>/dev/null || stat -f %Sg "$path")
    [[ "$cur_mode"  != "${mode#0}" ]] && do_or_say chmod  "$mode"          "$path" && chg "chmod $path $mode"
    [[ "$cur_owner" != "$owner"   ]] && do_or_say chown  "$owner"         "$path" && chg "chown $path $owner"
    [[ "$cur_group" != "$group"   ]] && do_or_say chgrp  "$group"         "$path" && chg "chgrp $path $group"
  fi
}

# Ensure a symlink target -> link, idempotent and no-prompt.
ensure_symlink() {
  local target="$1" link="$2"
  if [[ -L "$link" ]]; then
    local current
    current=$(readlink "$link")
    [[ "$current" == "$target" ]] && return 0
    chg "symlink $link drifted ($current -> $target)"
  elif [[ -e "$link" ]]; then
    err "$link exists and is not a symlink; refusing to clobber"
    return 1
  fi
  do_or_say ln -sfn "$target" "$link"
  chg "symlink $link -> $target"
}

# Ensure file content matches stdin; preserve mode/owner/group.
ensure_file() {
  local path="$1" mode="${2:-0644}" owner="${3:-root}" group="${4:-root}"
  local desired tmp
  desired=$(cat)
  if [[ -f "$path" ]] && diff -q <(printf '%s' "$desired") "$path" >/dev/null 2>&1; then
    # Content matches; reconcile mode/owner only.
    local cur_mode cur_owner cur_group
    cur_mode=$(stat -c %a "$path" 2>/dev/null || stat -f %A "$path")
    cur_owner=$(stat -c %U "$path" 2>/dev/null || stat -f %Su "$path")
    cur_group=$(stat -c %G "$path" 2>/dev/null || stat -f %Sg "$path")
    [[ "$cur_mode"  != "${mode#0}" ]] && do_or_say chmod "$mode"  "$path" && chg "chmod $path $mode"
    [[ "$cur_owner" != "$owner"   ]] && do_or_say chown "$owner" "$path" && chg "chown $path $owner"
    [[ "$cur_group" != "$group"   ]] && do_or_say chgrp "$group" "$path" && chg "chgrp $path $group"
    return 0
  fi
  if (( DRY_RUN )); then
    printf 'WOULD: write %s (mode %s owner %s:%s, %d bytes)\n' \
      "$path" "$mode" "$owner" "$group" "${#desired}" >&2
    return 0
  fi
  # Atomic write.
  tmp=$(mktemp "${path}.XXXXXX")
  trap 'rm -f "$tmp"' EXIT
  printf '%s' "$desired" >"$tmp"
  install -m "$mode" -o "$owner" -g "$group" "$tmp" "$path"
  rm -f "$tmp"
  trap - EXIT
  chg "wrote $path ($mode $owner:$group, ${#desired} bytes)"
}

# Ensure a line is in a file, identified by a sentinel comment.
ensure_line() {
  local file="$1" line="$2" sentinel="$3"
  if [[ ! -f "$file" ]]; then
    do_or_say bash -c "printf '%s\n' '$line' >'$file'"
    chg "created $file with $sentinel"
    return 0
  fi
  if grep -qF "$sentinel" "$file"; then
    local current
    current=$(grep -F "$sentinel" "$file" | head -n1)
    if [[ "$current" == "$line" ]]; then
      return 0  # already correct
    fi
    # Replace the line containing the sentinel.
    if (( DRY_RUN )); then
      printf 'WOULD: replace line in %s containing "%s" with "%s"\n' "$file" "$sentinel" "$line" >&2
      return 0
    fi
    local escaped_sentinel
    escaped_sentinel=$(printf '%s' "$sentinel" | sed 's:[][\\/.^$*]:\\&:g')
    sed -i.bak "/$escaped_sentinel/c\\
$line" "$file"
    rm -f "${file}.bak"
    chg "updated line in $file ($sentinel)"
  else
    do_or_say bash -c "printf '%s\n' '$line' >>'$file'"
    chg "appended line to $file ($sentinel)"
  fi
}

# Ensure a user exists with home and shell.
ensure_user() {
  local user="$1" home="$2" shell="${3:-/bin/bash}"
  if id -u "$user" &>/dev/null; then
    local cur_home cur_shell
    cur_home=$(getent passwd "$user" | cut -d: -f6)
    cur_shell=$(getent passwd "$user" | cut -d: -f7)
    [[ "$cur_home"  != "$home"  ]] && do_or_say usermod --home "$home" --move-home "$user" && chg "user $user home -> $home"
    [[ "$cur_shell" != "$shell" ]] && do_or_say usermod --shell "$shell" "$user" && chg "user $user shell -> $shell"
  else
    do_or_say useradd --home-dir "$home" --shell "$shell" --create-home "$user"
    chg "created user $user"
  fi
}

# Ensure a systemd service is enabled and running.
ensure_service() {
  local svc="$1" state="${2:-running}"  # running | stopped
  local enabled="${3:-enabled}"          # enabled | disabled

  case "$enabled" in
    enabled)
      systemctl is-enabled "$svc" &>/dev/null || \
        { do_or_say systemctl enable "$svc"; chg "enabled $svc"; }
      ;;
    disabled)
      systemctl is-enabled "$svc" &>/dev/null && \
        { do_or_say systemctl disable "$svc"; chg "disabled $svc"; } || true
      ;;
  esac

  case "$state" in
    running)
      systemctl is-active "$svc" &>/dev/null || \
        { do_or_say systemctl start "$svc"; chg "started $svc"; }
      ;;
    stopped)
      systemctl is-active "$svc" &>/dev/null && \
        { do_or_say systemctl stop "$svc"; chg "stopped $svc"; } || true
      ;;
  esac
}

Real-World Recipes

Recipe 1: Idempotent package installation across distros

# Source lib/state.sh first.

ensure_pkg() {
  local pkg="$1"
  if command -v dpkg-query &>/dev/null; then
    if dpkg-query -W -f='${Status}\n' "$pkg" 2>/dev/null | grep -q "ok installed"; then
      return 0
    fi
    do_or_say apt-get install -y "$pkg"
    chg "installed $pkg (apt)"
  elif command -v rpm &>/dev/null; then
    if rpm -q "$pkg" &>/dev/null; then
      return 0
    fi
    do_or_say dnf install -y "$pkg" 2>/dev/null || do_or_say yum install -y "$pkg"
    chg "installed $pkg (rpm)"
  else
    err "no supported package manager"; return 1
  fi
}

ensure_pkg curl
ensure_pkg jq
ensure_pkg "$([[ "$(uname -s)" == Linux ]] && echo iproute2 || echo iproute)"

Recipe 2: Idempotent cron entry with sentinel

ensure_dir /etc/cron.d 0755 root root

ensure_line /etc/cron.d/myapp \
  '*/5 * * * * deploy /opt/myapp/bin/heartbeat >/dev/null 2>&1 # managed-by:myapp:heartbeat' \
  'managed-by:myapp:heartbeat'

# Removing a managed cron entry is also idempotent:
remove_line() {
  local file="$1" sentinel="$2"
  [[ -f "$file" ]] || return 0
  grep -qF "$sentinel" "$file" || return 0
  do_or_say sed -i.bak "/$sentinel/d" "$file"
  rm -f "${file}.bak"
  chg "removed line from $file ($sentinel)"
}

Recipe 3: Idempotent remote API state (DNS record)

# Reconcile a DNS A record via Cloudflare API.
ensure_dns_a_record() {
  local zone_id="$1" name="$2" desired_ip="$3" cf_token="${CLOUDFLARE_API_TOKEN:?}"

  # 1. Read current state.
  local current_json current_id current_ip
  current_json=$(curl -fsS \
    -H "Authorization: Bearer $cf_token" \
    "https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records?name=$name&type=A")

  current_id=$(jq -r '.result[0].id // empty' <<<"$current_json")
  current_ip=$(jq -r '.result[0].content // empty' <<<"$current_json")

  # 2. Compare.
  if [[ -n "$current_id" && "$current_ip" == "$desired_ip" ]]; then
    log "$name -> $desired_ip already correct"
    return 0
  fi

  # 3. Apply minimal delta.
  if [[ -z "$current_id" ]]; then
    # Create.
    if (( DRY_RUN )); then
      printf 'WOULD: POST DNS record %s A %s\n' "$name" "$desired_ip" >&2
    else
      curl -fsS -X POST \
        -H "Authorization: Bearer $cf_token" -H "Content-Type: application/json" \
        "https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records" \
        --data "$(jq -nc --arg n "$name" --arg ip "$desired_ip" \
          '{type:"A",name:$n,content:$ip,ttl:300}')" >/dev/null
      chg "created DNS $name A $desired_ip"
    fi
  else
    # Update.
    if (( DRY_RUN )); then
      printf 'WOULD: PATCH DNS %s A from %s to %s\n' "$name" "$current_ip" "$desired_ip" >&2
    else
      curl -fsS -X PATCH \
        -H "Authorization: Bearer $cf_token" -H "Content-Type: application/json" \
        "https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records/$current_id" \
        --data "$(jq -nc --arg ip "$desired_ip" '{content:$ip}')" >/dev/null
      chg "updated DNS $name A: $current_ip -> $desired_ip"
    fi
  fi
}

ensure_dns_a_record "$ZONE_ID" "api.example.com" "203.0.113.42"

The four-step pattern is identical to local file/user reconciliation: read, compare, compute delta, apply minimum. The only difference is the API instead of a syscall.

Recipe 4: Database migration with idempotent runner

# Run SQL migration files only once each, tracked in a migrations table.
# Uses Postgres; the same shape works for any DB.

PGCONN="${PGCONN:-postgresql://app:secret@localhost/app_prod}"

ensure_migrations_table() {
  state_once db.migrations_table psql "$PGCONN" -v ON_ERROR_STOP=1 -c "
    CREATE TABLE IF NOT EXISTS schema_migrations (
      id          TEXT PRIMARY KEY,
      applied_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
      checksum    TEXT NOT NULL
    );"
}

apply_migration() {
  local file="$1"
  local id checksum
  id=$(basename "$file" .sql)
  checksum=$(sha256sum "$file" | awk '{print $1}')

  # Already applied?
  local applied
  applied=$(psql "$PGCONN" -tAc \
    "SELECT checksum FROM schema_migrations WHERE id='$id';" 2>/dev/null || true)

  if [[ -n "$applied" ]]; then
    if [[ "$applied" != "$checksum" ]]; then
      err "migration $id already applied with different checksum ($applied vs $checksum)"
      return 1
    fi
    log "migration $id already applied"
    return 0
  fi

  # Apply in a transaction; record on success.
  if (( DRY_RUN )); then
    printf 'WOULD: apply migration %s (%s)\n' "$id" "$checksum" >&2
    return 0
  fi

  psql "$PGCONN" -v ON_ERROR_STOP=1 --single-transaction <<EOF
\i $file
INSERT INTO schema_migrations (id, checksum) VALUES ('$id', '$checksum');
EOF
  chg "applied migration $id"
}

ensure_migrations_table
for f in /opt/myapp/migrations/*.sql; do
  apply_migration "$f"
done

Why this is idempotent:

Visual: The Reconciliation Lifecycle

        ┌────────────────────────────────────────────────────┐
        │              SCRIPT INVOCATION                     │
        │   ./apply.sh    or    ./apply.sh --dry-run         │
        └────────────────────┬───────────────────────────────┘
                             │
                             ▼
              ┌──────────────────────────────┐
              │ Source lib/state.sh          │
              │ Init STATE_DIR, parse flags  │
              └─────────────┬────────────────┘
                            │
                            ▼
        ┌───────────────────┴───────────────────┐
        │  For each ensure_*  (the desired      │
        │  state declarations):                 │
        │                                       │
        │  ┌─────────────────────────────────┐  │
        │  │ 1. Define desired (arguments)   │  │
        │  └─────────────┬───────────────────┘  │
        │                ▼                      │
        │  ┌─────────────────────────────────┐  │
        │  │ 2. Read actual (system query)   │  │
        │  └─────────────┬───────────────────┘  │
        │                ▼                      │
        │  ┌─────────────────────────────────┐  │
        │  │ 3. Compute delta                │  │
        │  └─────────────┬───────────────────┘  │
        │                ▼                      │
        │     ┌──────────┴──────────┐          │
        │     ▼ delta empty         ▼ non-empty│
        │   no-op                 do_or_say    │
        │   (silent)              CHANGE log   │
        │                                       │
        └───────────────────────────────────────┘
                            │
                            ▼
              ┌──────────────────────────────┐
              │ Summary:                     │
              │  N actions taken             │
              │  M no-ops (already correct)  │
              │  K dry-run announcements     │
              └──────────────────────────────┘

The 12-Item Idempotency Footgun List

After reviewing dozens of “almost-idempotent” scripts in production, these are the patterns that fail in practice:

  1. The “marker before work” bug. Writing the state file before the operation succeeds, so a crash leaves a marker that lies. Always write the marker as the last step.

  2. mkdir without -p. First run succeeds, second run fails with “File exists.” Always mkdir -p, even when you “know” the dir doesn’t exist.

  3. echo "line" >> file without sentinel. Each rerun appends another copy. Always pair appends with a sentinel-comment guard.

  4. useradd without an id -u guard. Fails on rerun with “user already exists.” Always read first.

  5. ln -s without -fn. First run creates the link; second run fails. Worse, if the link target is a directory and -n is missing, ln -sf creates a link inside the directory. Always ln -sfn.

  6. set -e masking idempotency holes. useradd user fails because user exists; set -e aborts the script. The author “fixes” it with useradd user || true, hiding the failure. The right fix is to read first.

  7. sed -i without backup file disposal. sed -i.bak leaves .bak files everywhere on every run. Either sed -i (no backup, GNU only) or sed -i.bak followed by rm -f file.bak. Note: sed -i syntax differs between GNU sed and BSD sed; on macOS use sed -i '' or sed -i.bak with cleanup.

  8. Dry-run that doesn’t reach all code paths. If if (( DRY_RUN )); then return; fi is at the top of every function, you’re testing nothing. Use a do_or_say wrapper around mutating commands; reads always run.

  9. State stored in /tmp. systemd-tmpfiles may clean /tmp on reboot or schedule. Your “already done” marker disappears, the script re-runs the work, and you get duplicate side effects. Use /var/lib/myapp/state for system state.

  10. Markers in different namespaces collide. Two scripts both write /var/lib/state/done; one clobbers the other’s state. Always namespace with the script name: $STATE_DIR/myapp/v2-installed.

  11. Reconciliation that compares wrong fields. “Does the user exist?” → id -u says yes. “Is the home dir right?” → never checked. The user drifted. Reconcile every field you declare desired state for.

  12. Idempotency that depends on shared mutable state. Two scripts both manage /etc/sysctl.d/99-tuning.conf. They each write their own line, but on conflict, the last one wins. The reconciliation pattern only protects what one script owns. For multi-owner files, use /etc/sysctl.d/<owner>-<name>.conf per owner.

Quick-Reference Card

┌─ THE FOUR-STEP RECONCILIATION ────────────────────────────────────────┐
│  1. desired = arguments              "I want X to be true"            │
│  2. actual  = read system            "What is true now?"              │
│  3. delta   = desired - actual       "What's different?"              │
│  4. apply minimal change             "Move only the delta"            │
└────────────────────────────────────────────────────────────────────────┘

┌─ IDEMPOTENT PRIMITIVES (use these by default) ────────────────────────┐
│  mkdir -p DIR                        idempotent dir create            │
│  ln -sfn TARGET LINK                 idempotent symlink               │
│  install -m MODE -o U -g G S D       idempotent file install w/ perms │
│  rm -f FILE                          idempotent removal               │
│  id -u USER || useradd USER          guarded user create              │
│  getent group GRP || groupadd GRP    guarded group create             │
│  systemctl is-active SVC || start    guarded service start            │
└────────────────────────────────────────────────────────────────────────┘

┌─ STATE MARKERS ───────────────────────────────────────────────────────┐
│  Where:    /var/lib/$APP/state/   (NOT /tmp)                         │
│  Name:     <action>-<version>.marker  (versioned)                    │
│  Content:  ISO 8601 timestamp + identifying metadata                 │
│  When to write:  AFTER the work succeeds, never before               │
│  When to use:  one-shot/expensive ops; NOT for ongoing config        │
└────────────────────────────────────────────────────────────────────────┘

┌─ DRY-RUN DISCIPLINE ──────────────────────────────────────────────────┐
│  Reads run live (id, getent, stat, curl GET).                        │
│  Writes gated by do_or_say: in DRY_RUN=1, prints "WOULD: cmd".       │
│  Same code paths as live mode; only side effects differ.             │
│  Output is greppable: `script --dry-run | grep ^WOULD`.              │
└────────────────────────────────────────────────────────────────────────┘

┌─ APPEND-ONLY-IF-MISSING ──────────────────────────────────────────────┐
│  grep -qF "$sentinel" "$file" && update_in_place || append          │
│  Sentinel = unique comment: "# managed-by:$APP:$RULE"                │
│  Survives reformatting; greppable in audit                           │
└────────────────────────────────────────────────────────────────────────┘

┌─ ATOMIC FILE WRITE ───────────────────────────────────────────────────┐
│  tmp=$(mktemp "$path.XXXXXX")                                         │
│  printf '%s' "$content" >"$tmp"                                       │
│  install -m MODE -o U -g G "$tmp" "$path"                            │
│  rm -f "$tmp"                                                         │
│  Reader sees old or new file, never partial.                         │
└────────────────────────────────────────────────────────────────────────┘

What’s Next

Idempotency is the discipline; filesystem semantics is what your idempotent operations rest on. The next lesson, Filesystem Semantics: Hard Links, Symlinks, Mount Namespaces & fsync Discipline, drills into the mechanics that determine whether your atomic writes are actually atomic, why ln -sfn behaves differently across filesystems, when mv is atomic and when it isn’t (cross-device renames!), and how fsync discipline turns “I wrote it to disk” into a guarantee instead of a hope. Together with this lesson, those primitives let you build scripts that converge correctly even on flaky filesystems and crash-prone hosts.

shellbashidempotencyautomationreconciliationconfiguration-managementansibleterraformdriftstate-management
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments