Defensive Scripting: set -Eeuo pipefail, IFS Hardening, ShellCheck & Error Propagation — Turning Quick Hacks Into Production Code

This is the first lesson of Wave 2. Wave 1 (L1–L12) gave you the mental model and the toolkit. From here onwards, the focus shifts: we’re not learning what shell can do — we’re learning how to write shell you’d be willing to ship to production and trust to run unattended at 03:00.

The first and biggest discipline is defensive scripting: writing scripts that, when something goes wrong, fail loudly and immediately rather than silently corrupting state and continuing. Every senior engineer has a horror story about a shell script that “ran fine” but actually skipped half its work because of an unquoted variable or a missing pipefail.

By the end of this lesson you will:

Know every flag in set -Eeuo pipefail, what it does, and (critically) where it doesn’t fire.
Apply IFS hardening as a routine reflex, not a special case.
Run ShellCheck on every script and read its findings fluently — including knowing which rules to disable and how to suppress them safely.
Write a reusable lib/errors.sh with die, warn, assert_* helpers that you can drop into any project.
Use the ERR trap to print line, function, and command context when something fails.
Propagate errors across function boundaries (a non-trivial bash quirk most engineers get wrong).

This is the most important lesson in Tier 3. Internalise it now and the next nine lessons will land with much less friction.

1. The strict-mode preamble in full

Every production shell script you write should start with the same five lines (after the shebang):

#!/usr/bin/env bash
set -Eeuo pipefail
IFS=$'\n\t'

That’s the canonical “strict mode” preamble. Let’s go through it flag by flag. The order in -Eeuo is conventional but irrelevant; bash treats them independently.

`-e` (errexit) — exit on any unhandled error

set -e

Causes the shell to exit immediately if any simple command exits with non-zero status. Without -e, scripts blunder through errors:

#!/usr/bin/env bash
# Without -e
cp /nonexistent /tmp/somewhere       # FAILS — but we keep going
echo "Done!"                          # prints "Done!" — but the cp failed

With -e, the script exits as soon as cp fails. This is what you want 99% of the time.

The gotcha: -e does NOT trigger in many cases that surprise people:

Inside if, while, until conditions: that’s the whole point of those constructs — they test exit codes.
```
if grep -q PATTERN file; then ...   # grep returning 1 (no match) doesn't exit
```

After && or || in a chain (until the rightmost):

cmd1 && cmd2          # cmd1 failing doesn't exit; the whole chain exits 1

Negated commands (! cmd):

! cmd                 # never exits; the `!` *expects* failure

Inside command substitution ($(…)) — by default, the failure of a command in $(...) does not propagate. (This was tightened in bash 4.4+ with inherit_errexit.)
```
X=$(cmd1; cmd2)       # cmd1 failing doesn't exit if cmd2 succeeds
shopt -s inherit_errexit   # bash 4.4+ — propagates errexit into $(...)
```
The last command in a pipeline is what $? becomes; intermediate command failures are silently swallowed unless pipefail is set.
Functions return-statuses for set -e purposes: if a function fails inside if !, set -e doesn’t fire there either — same as any other command.

So -e is a coarse safety net, not a fine-grained correctness check. It catches “I forgot to handle the failure of cp” but won’t help with subtler bugs.

`-E` (errtrace) — extend ERR trap to functions and subshells

set -E

Without -E, an ERR trap defined at the top level does not fire inside functions, subshells, or command substitutions. With -E, traps inherit. This is essential if you want the ERR trap pattern (covered below) to be useful.

If you don’t use ERR traps, -E is harmless. But you should use ERR traps. So set -E always.

`-u` (nounset) — fail on unset variables

set -u

Causes the shell to exit when you reference an unset variable:

set -u
echo "$UNDEFINED_VAR"           # error: UNDEFINED_VAR: unbound variable

This catches typos:

NAME="alice"
echo "$NAEM"                    # without -u, prints empty string. with -u, fails immediately.

The gotcha: things you might think are “unset” actually aren’t. The empty string "" is a set variable with empty value:

NAME=""
echo "$NAME"                    # works fine — empty, but set

If you genuinely want “default if unset,” use parameter expansion:

echo "${NAME:-default}"         # default if unset OR empty
echo "${NAME-default}"          # default if unset only (empty stays empty)

For arrays, "${ARR[@]}" on an empty array sometimes triggers -u in older bash versions. The safe pattern:

ARR=()
for item in "${ARR[@]+"${ARR[@]}"}"; do …; done    # works on bash 3.2 with -u

For modern bash (4.4+), "${ARR[@]}" on an empty array is fine.

`-o pipefail` — pipeline returns first non-zero status

We covered this in L8. Without pipefail, the exit status of a | b | c is just c’s status. With pipefail, the whole pipeline returns the rightmost non-zero status. So:

set -o pipefail
cat /nonexistent | grep PATTERN     # cat fails with 1, grep "succeeds" with 1 (no match)
                                     # without pipefail: $? = 1 (looks like normal grep no-match)
                                     # with pipefail: $? = 1 (because cat failed)

Combined with -e, this means the script also exits.

`IFS=$'\n\t'` — restrict word splitting to newline + tab

We covered this in L2. The default IFS includes space, which means for f in $UNQUOTED_VAR splits on spaces — disastrous if $UNQUOTED_VAR contains filenames with spaces. Setting IFS to just newline and tab means word-splitting only happens on those, so spaces in filenames are preserved.

IFS=$'\n\t'
NAMES="alice bob carol"
for n in $NAMES; do echo "[$n]"; done   # one element: [alice bob carol] — no space-split

This forces you to always quote when iterating, which is the right discipline. for n in "${ARR[@]}" (quoted, array) is the canonical correct way; the loose for n in $VAR becomes harder to misuse.

Checking strict mode is active

# These lines should appear right after the shebang:
set -Eeuo pipefail
IFS=$'\n\t'

# To verify in a running script:
echo "errexit: $-"
# 'h', 'B', 'e', 'u', 'o' (and others) appear in $-

The $- variable shows the active flags as letters.

2. The ERR trap — line-precise diagnostics

The minimum-viable error handler is just set -e: exit on failure. But that gives you no information about what failed. The ERR trap is the next level:

#!/usr/bin/env bash
set -Eeuo pipefail
IFS=$'\n\t'

trap 'on_error $LINENO' ERR

on_error() {
  local line=$1
  echo "[ERROR] $0: line $line: command failed (exit $?)" >&2
  # cleanup if needed
  exit 1
}

# Now any uncaught failure prints which line failed
cp /nonexistent /tmp/x       # ERR trap fires before exit

Use single quotes for the trap argument so $LINENO expands at trap-time, not at registration-time. (Same as L10’s signal trap rule.)

For richer diagnostics, capture more context:

trap 'on_error $? $LINENO "$BASH_COMMAND"' ERR

on_error() {
  local exit_code=$1 line=$2 cmd=$3
  printf '[ERROR] %s:%d: command "%s" exited with %d\n' "$0" "$line" "$cmd" "$exit_code" >&2
  exit "$exit_code"
}

$BASH_COMMAND is the command that just failed. Combined with $LINENO, this gives you a full diagnostic line:

[ERROR] ./deploy.sh:42: command "kubectl apply -f manifest.yaml" exited with 1

This is the level of diagnostics you want in production.

Why `-E` matters

Without -E, ERR traps registered at the top level do NOT fire inside functions:

#!/usr/bin/env bash
set -eo pipefail        # NOTE: no -E
trap 'echo "TRAPPED at $LINENO"' ERR

myfn() {
  cp /nonexistent /tmp/x   # this fails but ERR trap does NOT fire
}

myfn                       # script exits silently

With -E:

#!/usr/bin/env bash
set -Eeo pipefail
trap 'echo "TRAPPED at $LINENO"' ERR

myfn() {
  cp /nonexistent /tmp/x
}

myfn                       # ERR trap fires, prints line number

This is why -E is part of the canonical preamble.

Multi-line stack traces

For really verbose error reporting, walk the bash call stack:

on_error() {
  local exit_code=$? line=${BASH_LINENO[0]}
  printf '\n[FATAL] script %s exited with %d at line %d\n' "$0" "$exit_code" "$line" >&2
  printf 'Call stack:\n' >&2
  local i=0
  while caller $i >/dev/null 2>&1; do
    printf '  %s\n' "$(caller $i)" >&2
    ((i++))
  done
  exit "$exit_code"
}
trap on_error ERR

caller N prints LINE FUNCTION FILE for the Nth frame in the call stack. Combined with set -E, this gives full traces:

[FATAL] script ./deploy.sh exited with 1 at line 42
Call stack:
  42 do_deploy ./deploy.sh
  78 main ./deploy.sh
  103 main ./deploy.sh

Use this in any script complex enough to have nested functions.

3. The `die` / `warn` / `info` family

Every production script should have a small set of helpers for consistent error reporting and structured output. The canonical names are die (fatal), warn (non-fatal), info (informational). Here’s the minimum set:

#!/usr/bin/env bash
set -Eeuo pipefail
IFS=$'\n\t'

readonly SCRIPT_NAME="${0##*/}"

die()  { printf '[%s] FATAL: %s\n' "$SCRIPT_NAME" "$*" >&2; exit 1; }
warn() { printf '[%s] WARN:  %s\n' "$SCRIPT_NAME" "$*" >&2; }
info() { printf '[%s] INFO:  %s\n' "$SCRIPT_NAME" "$*" >&2; }
debug() { [[ "${DEBUG:-0}" == 1 ]] && printf '[%s] DEBUG: %s\n' "$SCRIPT_NAME" "$*" >&2 || true; }

# Usage:
[[ -f "$CONFIG" ]] || die "config file $CONFIG not found"
warn "skipping $f — not a regular file"
info "starting deploy of $TAG"
DEBUG=1 ./script    # turn on debug

Notes:

All output goes to stderr (>&2). Stdout is reserved for the script’s actual data output. This is how Unix tools have always worked, and it lets the script be used in pipelines: output_data=$(./myscript) works even if it logs heavily.
die must exit — it never returns. Convention: exit code 1 for general errors, 2 for “usage” errors, anything else as needed.
${0##*/} strips the path so we just print the script’s basename.
debug is a no-op unless DEBUG=1 is set in the environment.

Argument-validation idioms with `die`

[[ $# -ge 2 ]] || die "usage: $0 <env> <tag>"
[[ -d "$WORK_DIR" ]] || die "WORK_DIR ($WORK_DIR) is not a directory"
command -v kubectl >/dev/null 2>&1 || die "kubectl not found in PATH"
[[ "$ENV" =~ ^(dev|staging|prod)$ ]] || die "invalid env: $ENV"

Read those left-to-right: “this condition must be true OR die.” It’s the most idiomatic shell error-checking pattern.

`assert_*` helpers

For more elaborate validation, build up an assert family:

assert_command()  { command -v "$1" >/dev/null 2>&1 || die "command not found: $1"; }
assert_file()     { [[ -f "$1" ]] || die "file not found: $1"; }
assert_dir()      { [[ -d "$1" ]] || die "directory not found: $1"; }
assert_var()      { [[ -n "${!1:-}" ]] || die "required variable is unset or empty: $1"; }
assert_readable() { [[ -r "$1" ]] || die "file not readable: $1"; }

# Usage:
assert_command kubectl
assert_command jq
assert_var KUBE_NAMESPACE       # checks that $KUBE_NAMESPACE is set & non-empty
assert_dir "$DEPLOY_DIR"

${!1:-} is indirect variable expansion — the value of the variable whose name is $1. Combined with :- (default empty if unset), it lets us check arbitrary variable names.

This sort of helpers turn 30-line “validate inputs” blocks into a tight 5-line preamble.

4. ShellCheck — your script linter

ShellCheck is to shell what TypeScript is to JavaScript: it catches an enormous fraction of real bugs before runtime. Run it on every script. Always.

Install

# macOS
brew install shellcheck

# Debian / Ubuntu
sudo apt install shellcheck

# Alpine
apk add shellcheck

# Other: download from https://github.com/koalaman/shellcheck/releases

Run

shellcheck myscript.sh
shellcheck -x myscript.sh    # follow `source` statements (-x = external sources)

The most common findings (and what to do)

ShellCheck assigns each rule a number: SC2086, SC2155, etc. The most common in real scripts:

SC2086 — “Double quote to prevent globbing and word splitting”

NAME="alice smith"
cp $NAME /tmp/         # SC2086: cp alice smith /tmp/ — looks for two files
cp "$NAME" /tmp/       # CORRECT

The single most-common shell bug. The fix is always to quote.

SC2046 — “Quote this to prevent word splitting”

ls $(find . -name '*.txt')        # SC2046 — output of find may have spaces
ls "$(find . -name '*.txt')"      # NOPE — collapses all into single arg
mapfile -d '' FILES < <(find . -name '*.txt' -print0)
ls "${FILES[@]}"                  # CORRECT

This one is harder to fix mechanically — sometimes you really do want word-splitting. ShellCheck warns regardless; you have to think about which case applies.

SC2155 — “Declare and assign separately to avoid masking return values”

local var="$(get_value)"          # SC2155
# Why? `local` always succeeds, so $? after this is local's status, NOT get_value's.
# If get_value fails, you don't notice.

# CORRECT:
local var
var="$(get_value)"

Fix: declare first, assign on the next line. Otherwise set -e won’t catch failure of the assigned expression.

SC2034 — “var appears unused”

NAME="alice"      # never used elsewhere — typo? dead code?

Either remove or use, or annotate as exported (export NAME=...).

SC2154 — “var is referenced but not assigned”

echo "$VAARS"     # typo — meant $VARS?

SC2126 — “Consider using grep -c instead of grep | wc -l”

grep PATTERN file | wc -l       # SC2126 — slower
grep -c PATTERN file            # better

SC2068 — “Double quote array expansions”

for arg in $@; do …; done       # SC2068
for arg in "$@"; do …; done     # CORRECT

SC2002 — “Useless cat”

cat file | grep PATTERN         # SC2002
grep PATTERN < file             # better
grep PATTERN file               # best

SC2164 — “Use cd … || exit”

cd /tmp                         # SC2164 — what if /tmp doesn't exist?
cd /tmp || die "cannot cd"      # better

Suppressing rules

When ShellCheck is wrong (rare but happens), suppress the specific rule above the line:

# shellcheck disable=SC2086
COMMAND="ls -la"
$COMMAND        # we WANT word-splitting here

# Or for an entire file (top of file, after shebang):
# shellcheck disable=SC2086,SC2046

Always add a comment explaining why you suppressed:

# shellcheck disable=SC2086  # we deliberately split — kubectl args from build
$KUBECTL_ARGS

CI integration

Add a step to your pipeline that fails the build on any ShellCheck warning:

# .github/workflows/lint.yml (snippet)
- name: ShellCheck
  run: shellcheck scripts/*.sh

For pre-commit:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/koalaman/shellcheck-precommit
    rev: v0.10.0
    hooks:
      - id: shellcheck

There’s no excuse for not running ShellCheck. Install it on day one.

5. Error propagation across function boundaries

This is where most engineers get tripped up. Bash has subtle rules about how errors propagate, especially with command substitution and pipelines.

The problem: command-substitution exit codes

#!/usr/bin/env bash
set -Eeuo pipefail

get_count() {
  cat /nonexistent          # this fails
  echo "10"
}

# What does $COUNT become?
COUNT=$(get_count)           # $COUNT = "10"; the cat failure was swallowed!
echo "got: $COUNT"           # "got: 10" — function appeared to succeed!

The function get_count internally failed at cat /nonexistent, but because it ran inside $(...), set -e doesn’t propagate the failure out by default.

The fix is inherit_errexit (bash 4.4+):

shopt -s inherit_errexit     # add to your preamble
COUNT=$(get_count)            # NOW the cat failure does propagate; script exits

For older bash, the workaround is to capture and check explicitly:

if ! COUNT=$(get_count); then
  die "get_count failed"
fi

Or to split assignment from the failable command:

if ! OUTPUT=$(get_count 2>&1); then
  die "get_count failed: $OUTPUT"
fi

The pipefail dance with `set -e`

A subtle case:

set -eo pipefail
cmd1 | cmd2            # if cmd1 fails, $? becomes cmd1's status (pipefail), then -e fires

That’s the desired behaviour. But:

set -eo pipefail
output=$(cmd1 | cmd2)  # $? = the right-most failure ... but does -e fire?

Without inherit_errexit, no — the failure is contained in the command sub. With inherit_errexit, yes. Same lesson: enable inherit_errexit when on bash 4.4+.

Functions and `local` masking exit codes

We saw SC2155 above — but the symptom is worth re-stating:

process() {
  local result="$(some_failing_command)"   # SC2155
  echo "$result"
}

Here, local always succeeds. Even though some_failing_command failed, the value passed to local is its empty stdout, and the function continues. set -e does not fire. The fix:

process() {
  local result
  result="$(some_failing_command)"     # NOW set -e can fire on failure
  echo "$result"
}

This is one of the strongest reasons to lint with ShellCheck.

`if !` and `until` swallow failures

if ! some_check; then
  …
fi

Inside an if (or while, until) condition, set -e is suspended for that command. Don’t put expensive logic in if conditions if you want set -e to catch its failures — restructure so the call is at the top level:

some_check
status=$?
if [[ $status -ne 0 ]]; then …; fi   # but now set -e already exited :(

Easier:

if some_check; then
  …
else
  die "some_check failed"
fi

The “explicit if/else with explicit error path” is sometimes the cleanest pattern.

6. The `lib/errors.sh` framework

Let’s assemble everything into a reusable library. Save this as lib/errors.sh and source it from every script:

# lib/errors.sh — defensive shell helpers
# Source this from every script:
#     source "$(dirname "${BASH_SOURCE[0]}")/lib/errors.sh"

# strict mode
set -Eeuo pipefail
shopt -s inherit_errexit nullglob 2>/dev/null || true   # bash 4.4+; nullglob always ok
IFS=$'\n\t'

readonly SCRIPT_NAME="${0##*/}"

# Logging
_log() { local lvl=$1; shift; printf '[%s] %s: %s\n' "$SCRIPT_NAME" "$lvl" "$*" >&2; }
die()   { _log "FATAL" "$@"; exit 1; }
warn()  { _log "WARN " "$@"; }
info()  { _log "INFO " "$@"; }
debug() { [[ "${DEBUG:-0}" == 1 ]] && _log "DEBUG" "$@" || true; }

# Validation
assert_command()  { command -v "$1" >/dev/null 2>&1 || die "command not found: $1"; }
assert_file()     { [[ -f "$1" ]] || die "file not found: $1"; }
assert_dir()      { [[ -d "$1" ]] || die "directory not found: $1"; }
assert_var()      { [[ -n "${!1:-}" ]] || die "required variable is unset or empty: \$$1"; }
assert_readable() { [[ -r "$1" ]] || die "file not readable: $1"; }
assert_writable() { [[ -w "$1" ]] || die "file not writable: $1"; }
assert_in()       {
  local val=$1; shift
  local choice
  for choice in "$@"; do [[ "$val" == "$choice" ]] && return 0; done
  die "value '$val' is not in: $*"
}

# ERR trap — run on any uncaught failure
_on_err() {
  local exit_code=$? line=${BASH_LINENO[0]} cmd=${BASH_COMMAND}
  printf '\n[%s] FATAL: %s:%d: "%s" exited %d\n' "$SCRIPT_NAME" "${BASH_SOURCE[1]:-$0}" "$line" "$cmd" "$exit_code" >&2
  if [[ ${#FUNCNAME[@]} -gt 1 ]]; then
    printf 'Call stack:\n' >&2
    local i=0
    while caller $i >/dev/null 2>&1; do
      printf '  %s\n' "$(caller $i)" >&2
      ((i++))
    done
  fi
  exit "$exit_code"
}
trap _on_err ERR

Now any script that sources this gets:

#!/usr/bin/env bash
source "$(dirname "${BASH_SOURCE[0]}")/lib/errors.sh"

# strict mode active, ERR trap active, helpers available
assert_command kubectl
assert_command jq
assert_var TARGET_ENV
assert_in "$TARGET_ENV" dev staging prod

info "deploying to $TARGET_ENV"
kubectl apply -f manifests/

info "deploy complete"

Any failure produces:

[deploy.sh] FATAL: ./deploy.sh:8: "kubectl apply -f manifests/" exited 1
Call stack:
  8 do_deploy ./deploy.sh
  20 main ./deploy.sh

This is what production-grade shell looks like.

7. Patterns for partial success

Strict mode is the default. But sometimes you genuinely want “do this, ignore failure”:

# Pattern 1: Explicit `|| true` to suppress one command
rm -rf /tmp/cache 2>/dev/null || true

# Pattern 2: Capture exit code without exiting
if some_optional_check; then
  info "optional check passed"
fi
# (no else — failure is fine)

# Pattern 3: Set a flag, branch on it
if has_optional_dep; then
  USE_OPTIONAL=1
else
  USE_OPTIONAL=0
fi

# Pattern 4: Run a series with set +e locally
do_optional_steps() {
  set +e
  cmd1
  cmd2
  cmd3
  set -e
}

Pattern 4 is sometimes necessary for “best effort” cleanup or batch operations — but use it sparingly and document why. For most production scripts, fail-fast is right.

`try`/`catch` the bash way

Bash has no try/catch. The closest pattern:

{
  some_thing
  another_thing
} || {
  warn "best-effort batch failed; continuing"
}

The { } group runs as a unit; the || after the closing brace gives you the “catch” branch. Use this for genuinely-optional work; don’t normalize it.

8. Complete example: a real defensive script

Putting it all together — a deploy script with full defensive engineering:

#!/usr/bin/env bash
# deploy.sh — push image to a Kubernetes namespace
# Usage: deploy.sh <env> <image-tag>

# 1. Strict-mode preamble
set -Eeuo pipefail
shopt -s inherit_errexit nullglob
IFS=$'\n\t'

# 2. Source our helpers
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=lib/errors.sh
source "$SCRIPT_DIR/lib/errors.sh"

# 3. Argument validation
[[ $# -eq 2 ]] || die "usage: $0 <env> <image-tag>"

readonly ENV="$1"
readonly TAG="$2"

assert_in "$ENV" dev staging prod
[[ "$TAG" =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]] || die "tag must be vMAJOR.MINOR.PATCH; got: $TAG"

assert_command kubectl
assert_command jq

# 4. Look up cluster context
KUBE_CONTEXT="kloudvin-${ENV}"
kubectl config get-contexts -o name | grep -qx "$KUBE_CONTEXT" || die "no kubectl context: $KUBE_CONTEXT"

# 5. Atomic deploy — set, then poll for rollout
info "switching to context $KUBE_CONTEXT"
kubectl config use-context "$KUBE_CONTEXT"

NAMESPACE="kloudvin"
DEPLOYMENT="api"

info "setting image to $TAG"
kubectl -n "$NAMESPACE" set image "deployment/$DEPLOYMENT" "$DEPLOYMENT=ghcr.io/kloudvin/api:$TAG"

info "waiting for rollout (timeout 5m)"
if ! kubectl -n "$NAMESPACE" rollout status "deployment/$DEPLOYMENT" --timeout=5m; then
  warn "rollout failed; rolling back"
  kubectl -n "$NAMESPACE" rollout undo "deployment/$DEPLOYMENT"
  die "rollout failed for $DEPLOYMENT in $NAMESPACE; rolled back"
fi

info "rollout complete: $DEPLOYMENT @ $TAG in $NAMESPACE/$ENV"

Breakdown:

Section 1: strict-mode preamble.
Section 2: import helpers via sourcing. The # shellcheck source=... directive tells ShellCheck where to find the sourced file, so it can lint cross-file.
Section 3: every input validated up-front. We fail before doing any actual work.
Section 4: validate cluster context exists.
Section 5: do the actual work; on rollout failure, roll back and die. We want to die, so the CI pipeline marks the deploy as failed.

This kind of script — strict, validated, atomic, fail-loud — is what you ship to a team you’ll work with for years.

9. Common pitfalls

“I added `set -e` but it’s still not catching X”

set -e has many exemptions (covered above). When something isn’t being caught, check:

Is it inside if/while/until?
Is it in $(...) without inherit_errexit?
Is it after && or ||?
Is it a local var=$(failing_cmd)?

The fix is usually to restructure or add explicit checking.

Trap fires twice on script exit

If you have both an EXIT trap and an ERR trap, they both fire. Either:

# Pattern A: ERR exits, EXIT does cleanup. ERR triggers EXIT.
trap '_on_err' ERR
trap '_cleanup' EXIT

Or:

# Pattern B: One unified handler that knows the difference
_on_exit() {
  local exit_code=$?
  if [[ $exit_code -ne 0 ]]; then
    warn "script exiting with $exit_code"
  fi
  _cleanup
}
trap _on_exit EXIT

We covered this in L10.

`pipefail` makes intentional `head` short-circuit fail

set -o pipefail
some_long_output | head -n 10        # head closes stdin early; some_long_output gets SIGPIPE; pipefail trips

The fix:

set -o pipefail
some_long_output | { head -n 10; cat >/dev/null; }   # absorb the rest
# Or:
some_long_output | head -n 10 || [[ $? -eq 141 ]]    # tolerate SIGPIPE (128+13)

Pick one; document the choice in a comment.

Sourcing `lib/errors.sh` from a non-standard relative path

source ./lib/errors.sh               # WRONG — depends on cwd at invocation
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/lib/errors.sh"   # CORRECT — always relative to the script's location

Always use BASH_SOURCE[0] + dirname for reliable sourcing.

ShellCheck false positives

Sometimes you’ll really want word-splitting (SC2086) or really want to use unquoted $@. The right answer is to suppress the rule on that line with an explanatory comment:

# shellcheck disable=SC2086  # OPTS is built up as a string of safe shell tokens
some_command $OPTS

Don’t suppress globally to silence noise. Diagnose each case.

10. Twelve idioms for daily use

# 1. Strict-mode preamble (every script)
set -Eeuo pipefail
shopt -s inherit_errexit nullglob
IFS=$'\n\t'

# 2. die / warn / info — minimum logging
die()  { printf '[%s] FATAL: %s\n' "${0##*/}" "$*" >&2; exit 1; }
warn() { printf '[%s] WARN:  %s\n' "${0##*/}" "$*" >&2; }
info() { printf '[%s] INFO:  %s\n' "${0##*/}" "$*" >&2; }

# 3. Basic ERR trap with line number
trap 'die "line $LINENO: $BASH_COMMAND failed (exit $?)"' ERR

# 4. assert_command
assert_command() { command -v "$1" >/dev/null 2>&1 || die "missing command: $1"; }

# 5. assert_var (checks variable is set and non-empty by name)
assert_var() { [[ -n "${!1:-}" ]] || die "required var unset: \$$1"; }

# 6. assert_in (value must be in a list)
assert_in() { local v=$1; shift; for c; do [[ "$v" == "$c" ]] && return 0; done; die "$v not in: $*"; }

# 7. Quoted argument validation
[[ $# -ge 2 ]] || die "usage: $0 <env> <tag>"

# 8. Regex validation
[[ "$TAG" =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]] || die "bad tag: $TAG"

# 9. Robust source path
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/lib/errors.sh"

# 10. Suppress ShellCheck rule with reason
# shellcheck disable=SC2086  # intentional split

# 11. local + assign on separate lines (avoid SC2155)
local result
result="$(some_command)"

# 12. Atomic write + rename
TMP=$(mktemp); some_command > "$TMP" && mv "$TMP" target

11. What you must internalise before lesson 14

What does set -E do, and why is it necessary? (Propagates ERR trap into functions / subshells / command substitution.)
What’s inherit_errexit? (Propagates set -e into command substitution $(...).)
Where does set -e not fire? (Inside if/while/until conditions; in $(...) without inherit_errexit; after &&/||; on commands prefixed with !.)
What’s the fix for SC2155 (local x="$(...)")? (Declare and assign on separate lines.)
Why must trap commands be in single quotes? (To prevent expansion at registration; expansion happens at fire time.)
What’s the canonical ERR trap line? (trap 'die "line $LINENO: $BASH_COMMAND failed (exit $?)"' ERR.)
What goes to stdout vs stderr? (Data → stdout. Logs and errors → stderr.)
Why call ShellCheck on every script? (Catches an enormous fraction of common shell bugs at lint time.)
How do you suppress a ShellCheck rule with explanation? (# shellcheck disable=SC2086 # reason.)
What’s assert_var "VAR_NAME" checking? (That the variable named “VAR_NAME” is set and non-empty, via indirect expansion ${!1:-}.)

If anything felt fuzzy, re-read. The next nine lessons assume you internalised these.

What’s next

Lesson 14: Argument Parsing — getopts, getopt, manual parsing & long options. Most real scripts take options (-v, --verbose, --config FILE). Bash gives you getopts for short options out of the box, but long options (--verbose) need extra work. We cover getopts deeply, the GNU getopt command (different! easy to confuse with getopts), the canonical manual long-option parsing pattern, optional vs required arguments, and how to integrate with usage() and the strict-mode preamble. After L14 your scripts will accept arguments the way kubectl, git, and other production CLIs do.

See you there.