Concurrency: Backgrounding, GNU parallel, xargs -P, FIFOs & Lock Files (flock) — Using All Your Cores Without Races

Most shell scripts run sequentially: one command, then the next. That’s fine until you’re processing 1,000 files, hitting 50 endpoints, or fanning out across 30 nodes. Suddenly serial execution means waiting hours when the CPU is at 8% utilisation.

Real concurrency in shell isn’t hard, but it has sharp edges:

Backgrounding (&) with wait — the bare-metal primitive.
xargs -P N — the simplest job pool; one command per input line, N concurrent.
GNU parallel — declarative parallelism with progress, retry, and structured output.
FIFOs (mkfifo) — named pipes for IPC between long-running processes.
flock — kernel-level mutual exclusion to serialise access to shared resources.

We covered the basics of &/wait in L9. This lesson goes deep, builds patterns you’ll actually use in production, and covers the race conditions to avoid.

By the end you’ll be able to run 100 deploys in parallel safely, monitor them, recover from partial failures, and never accidentally clobber a shared file.

1. Backgrounding refresher: `&` and `wait`

cmd &              # start cmd, return immediately, $! is its PID
wait               # wait for ALL background jobs
wait $PID          # wait for one specific PID
wait -n            # wait for ANY background job to finish (bash 4.3+)

#!/usr/bin/env bash
# Run three jobs in parallel, wait for all
do_thing 1 &
do_thing 2 &
do_thing 3 &
wait
echo "all done"

The exit code of wait is the exit code of the last job (with no PID arg) or of that job (with PID). To capture per-job:

do_thing 1 & PID1=$!
do_thing 2 & PID2=$!
do_thing 3 & PID3=$!

wait $PID1; RC1=$?
wait $PID2; RC2=$?
wait $PID3; RC3=$?

echo "results: $RC1 $RC2 $RC3"

This works for a known small number of jobs. For arbitrary counts, you need a job pool.

Job pool: bounded parallelism with `wait -n`

#!/usr/bin/env bash
set -Eeuo pipefail

MAX_JOBS=${MAX_JOBS:-4}
JOBS=()

for input in input1 input2 input3 input4 input5 input6 input7 input8 input9 input10; do
  # If we've reached the cap, wait for any one to finish first
  while (( ${#JOBS[@]} >= MAX_JOBS )); do
    wait -n     # wait for ANY background to finish
    # Rebuild JOBS — only PIDs still alive
    NEW=()
    for pid in "${JOBS[@]}"; do
      kill -0 "$pid" 2>/dev/null && NEW+=("$pid")
    done
    JOBS=("${NEW[@]}")
  done

  do_thing "$input" &
  JOBS+=($!)
done

wait    # wait for the last batch

This caps concurrency at MAX_JOBS. For most use cases, xargs -P does this more cleanly.

`wait -n` exit code (bash 4.3+)

After wait -n returns, $? is the exit code of the job that finished. To loop until all are done while monitoring:

JOBS=()
for input in "${INPUTS[@]}"; do
  worker "$input" &
  JOBS+=($!)
done

FAILURES=0
while (( ${#JOBS[@]} > 0 )); do
  wait -n
  rc=$?
  (( rc != 0 )) && ((FAILURES++))
  # Note: bash doesn't tell us WHICH job — we'd need to track manually.
done
echo "$FAILURES failures"

If you need per-job exit codes, track PIDs and wait $PID individually. For “all-or-nothing” semantics, wait (no args) at the end and check the global exit.

2. `xargs -P N` — the simplest job pool

The cleanest way to run a bounded-parallel set of commands over a list:

# Run gzip on every .log file, 8 in parallel
find /var/log -name '*.log' -print0 | xargs -0 -P 8 -n 1 gzip

Flags:

-P N — run N processes concurrently.
-n 1 — pass 1 argument per command (so each gzip handles one file).
-0 — input is NUL-separated (paired with find -print0).
-I {} — substitute {} in the command line (lets you put the arg somewhere other than the end).

# Custom command shape — pass each filename as $1 to a function call
printf '%s\n' "${FILES[@]}" | xargs -I {} -P 4 -n 1 bash -c 'process "$@"' _ {}

The trick bash -c '...' _ {} is: bash -c runs the script, the _ is $0, and {} becomes $1. Then process "$@" calls your function with the filename.

Number of cores

# Use all available cores
NPROC=$(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)

# Then:
xargs -P "$NPROC" -n 1 ...

For I/O-bound work (network calls, disk I/O), you can usefully set this to 2-4x cores. For CPU-bound (compression, encryption), stay at nproc.

Capturing output safely

When parallel commands write to stdout, lines can interleave. Best practice: have each one write to its own file, merge at the end.

mkdir -p /tmp/joblogs
find . -name '*.log' -print0 | xargs -0 -P 8 -n 1 -I {} \
  bash -c 'gzip --keep "{}" 2>"/tmp/joblogs/$(basename "{}").err"'
cat /tmp/joblogs/*.err          # merge afterwards

Or use xargs --process-slot-var:

xargs -P 4 -n 1 -I {} --process-slot-var=SLOT \
  bash -c 'echo "slot=$SLOT processing {}"' \
  < input.txt

SLOT becomes 0…3, letting each worker write to its own log file or use its own port etc. This is GNU-only but useful.

`xargs -P` exit code semantics

xargs exits with:

0 if everything succeeded
123 if any command exited 1-125
124 if any died on signal
125 if xargs itself failed
126 if a command couldn’t be executed

So xargs -P correctly fails if any subprocess fails. Good for use with set -e.

3. GNU `parallel` — declarative parallelism

xargs -P is fine for “run this command on every input.” parallel is for everything more elaborate: progress bars, retries, ETA, structured output, multi-input combinations.

brew install parallel        # macOS
sudo apt install parallel    # Debian/Ubuntu

Basic equivalent to `xargs -P`

# All three are equivalent
ls *.log | xargs -P 8 -n 1 gzip
ls *.log | parallel -j 8 gzip
parallel -j 8 gzip ::: *.log

::: is parallel’s syntax for inline argument lists. -j N is the parallelism degree.

Templated commands

parallel -j 4 'curl -sL https://api.example.com/{} -o {}.json' ::: alice bob carol dave

The {} is each input. parallel also supports:

{1}, {2}, … — positional from multiple input lists
{.} — input with extension stripped
{/} — basename only
{//} — dirname only
{#} — job number

# Compress, naming output by job number
parallel -j 4 'gzip -c {} > {.}.{#}.gz' ::: file*.log

Multiple input lists

parallel -j 8 'echo {1} {2}' ::: a b c ::: 1 2 3
# a 1
# a 2
# a 3
# b 1
# ...

Cartesian product of inputs by default. Use --xapply (or :::+) for paired:

parallel -j 8 --xapply 'echo {1} {2}' ::: a b c ::: 1 2 3
# a 1
# b 2
# c 3

Progress bar and ETA

parallel --bar 'process {}' ::: input1 input2 input3 ...
parallel --eta 'process {}' ::: input1 input2 input3 ...

Both show live progress. Useful for long-running batches.

Retry on failure

parallel --retries 3 'curl https://api.example.com/{}' ::: $(seq 1 1000)

Each command retries up to 3 times if it fails. Combined with --joblog:

parallel --joblog jobs.log --retries 3 'curl ...' ::: ...

You get a log of every job: input, exit code, time, retries.

Result aggregation with `--results`

parallel --results /tmp/jobout 'curl https://api.example.com/{}' ::: 1 2 3

Each job’s stdout/stderr go to /tmp/jobout/1/..., organized by argument. No interleaving.

Limit memory and CPU

parallel --memfree 1G 'big_cmd {}' ::: ...      # only run new jobs while >1G free
parallel --load 80% 'big_cmd {}' ::: ...        # only while CPU load <80%

Distribute across machines

parallel can SSH to remote machines and run jobs there:

parallel -S host1,host2,host3 -j 4 'process {}' ::: ...

-j 4 is per-host concurrency. This is genuinely impressive for distributed work without any framework — but you need passwordless SSH set up.

`parallel` vs `xargs -P` summary

Use xargs -P when:

The task is “run command X on each line of input.”
You don’t need progress, retry, or per-job logs.
You want maximum portability (xargs is on every system).

Use parallel when:

You need progress, retry, joblog, ETA.
You have multiple input lists to combine.
You’re doing distributed work via SSH.
Output needs to be aggregated cleanly.

Both have their place.

4. FIFOs — named pipes for IPC

A FIFO is a “named pipe” — a filesystem entry that acts as a pipe. Two unrelated processes can communicate through it.

Basics

mkfifo /tmp/myfifo

# Process A: writes
echo "hello from A" > /tmp/myfifo &

# Process B: reads
cat /tmp/myfifo
# hello from A

The write blocks until something opens the FIFO for reading; the read blocks until something writes. This is synchronous IPC.

Use case: background producer + consumer

#!/usr/bin/env bash
set -Eeuo pipefail

FIFO=$(mktemp -u /tmp/myfifo.XXXXXX)
mkfifo "$FIFO"
trap 'rm -f "$FIFO"' EXIT

# Producer in background
(
  for i in {1..10}; do
    sleep 0.1
    echo "msg $i"
  done > "$FIFO"
) &

# Consumer in foreground
while IFS= read -r line; do
  echo "got: $line"
done < "$FIFO"

wait

This pattern lets you set up a producer/consumer pipeline where the producer’s output is processed line-by-line by the consumer in the same shell context. With anonymous pipes (|), the right side runs in a subshell and can’t easily update parent variables.

A worker pool with FIFOs

#!/usr/bin/env bash
set -Eeuo pipefail

NUM_WORKERS=${NUM_WORKERS:-4}
FIFO=$(mktemp -u /tmp/workers.XXXXXX)
mkfifo "$FIFO"
trap 'rm -f "$FIFO"' EXIT

# Open the FIFO twice (read+write) so it doesn't close
exec 3<>"$FIFO"

# Pre-fill with NUM_WORKERS tokens
for ((i=0; i<NUM_WORKERS; i++)); do
  echo >&3
done

worker() {
  local item=$1
  process "$item"          # the actual work
  echo >&3                 # return token after we're done
}

for item in "${ITEMS[@]}"; do
  read -u 3                 # consume a token (blocks if none)
  worker "$item" &
done
wait
exec 3>&-                   # close FD 3

This is a classic “semaphore via FIFO” pattern. The FIFO acts as a counting semaphore: tokens limit concurrency to NUM_WORKERS.

xargs -P does this internally and more cleanly. The FIFO version is useful when you need finer control (e.g., variable-cost jobs, weighted slots).

5. `flock` — cross-process mutual exclusion

When multiple invocations of a script (cron jobs, signal handlers, manual runs) might collide, you need a lock. We saw flock briefly in L10. Here’s the full pattern.

Single-instance script

#!/usr/bin/env bash
set -Eeuo pipefail

LOCKFILE=/var/run/myscript.lock

# Acquire exclusive lock on FD 200; fail if already locked
exec 200>"$LOCKFILE"
flock -n 200 || { echo "another instance is running" >&2; exit 1; }

# ... rest of script ...

flock -n is non-blocking — it fails immediately if the lock is held. flock (no -n) blocks until acquired.

The lock auto-releases when the process exits (even on SIGKILL), because the kernel releases all FDs. No explicit unlock needed.

Locking a region of work

{
  flock 200
  critical_section
} 200>"$LOCKFILE"

The block in { ... } 200>FILE opens FD 200 and runs flock 200 to acquire. When the block exits, FD 200 is closed and the lock released. Useful when only part of a script is sensitive.

Self-locking script (one-liner)

#!/usr/bin/env bash
set -Eeuo pipefail
exec 200>"/var/run/${0##*/}.lock"
flock -n 200 || exit 0       # silently exit if locked

# ... rest ...

Combined with cron, this gives you “run every minute, but skip if previous run is still going” semantics.

`flock` shared vs exclusive

flock -s 200    # shared (multiple readers OK)
flock -x 200    # exclusive (default; only one)

For database-style “many readers, one writer” patterns. Most scripts just want exclusive.

Timeout on flock acquisition

flock -w 30 200 || die "could not acquire lock after 30s"

Useful when “wait but don’t wait forever” is the right behaviour.

6. Race conditions to avoid

TOCTOU — Time-Of-Check, Time-Of-Use

The classic shell race:

if [[ ! -f "$FILE" ]]; then
  touch "$FILE"
fi

Between the [[ -f ]] test and the touch, another process can create the file. Then both processes proceed, possibly clobbering. The fix is to use atomic operations:

# Atomic create-if-not-exists
( set -C; echo "$$" > "$FILE" ) 2>/dev/null && IS_CREATOR=1 || IS_CREATOR=0

set -C (noclobber) makes > fail if the file exists. The whole thing is atomic at the kernel level: the file either exists or is created by this process. No race.

Or use flock — acquire the lock before checking, so no one else can race.

Concurrent writes to the same file

# Three jobs in parallel all writing to log.txt — lines interleave at byte level
job1 >> log.txt &
job2 >> log.txt &
job3 >> log.txt &

For small writes (under PIPE_BUF, typically 4KB), append (>>) is atomic on Linux. For larger writes, lines can split. Prefer:

# Each job writes to its own file
job1 > log.1 &
job2 > log.2 &
job3 > log.3 &
wait
cat log.1 log.2 log.3 > log.txt

Or use flock to serialise:

log() { ( flock 200; printf '%s\n' "$*" >> log.txt ) 200>log.lock; }

Stale lock files (the cleanup problem)

If a process crashes without cleanup, its lock file may remain:

LOCKFILE=/var/run/myscript.lock
[[ -f "$LOCKFILE" ]] && exit 1     # WRONG — stale lock blocks forever

The right answer is flock: kernel-managed locks auto-release on process death. No PID files, no staleness, no manual cleanup.

exec 200>"$LOCKFILE"
flock -n 200 || exit 1
# lock is held by THIS process; releases when this process exits, no matter how

This is why every modern shell script that needs single-instance uses flock, not “PID file checks.”

Subshell variable scoping

COUNT=0
{ for i in {1..100}; do ((COUNT++)); done; } &
wait
echo "$COUNT"          # still 0 — subshell ran in its own COUNT

We covered this in L4. Subshells don’t propagate variables back. For accumulation across parallel work, use a file:

echo 0 > /tmp/count
for i in {1..100}; do
  ( count=$(< /tmp/count); echo $((count + 1)) > /tmp/count ) &      # WRONG — race!
done
wait

Even this has a TOCTOU race. Use flock to serialise the read-modify-write:

LOCK=/tmp/count.lock
COUNTFILE=/tmp/count
echo 0 > "$COUNTFILE"

increment() {
  ( flock 200; echo $(( $(< "$COUNTFILE") + 1 )) > "$COUNTFILE" ) 200>"$LOCK"
}

for i in {1..100}; do
  increment &
done
wait
echo "count: $(< $COUNTFILE)"

Or accept the limitation and accumulate after-the-fact:

{ for i in {1..100}; do echo "$i"; done > items; }
parallel -j 4 do_thing :::: items > results
total=$(wc -l < results)

For most scripts, “do work in parallel, write results to per-job files, aggregate after” is the simplest and safest pattern.

7. A complete parallel deploy script

Tying everything together — deploy 50 services in parallel, with bounded concurrency, retry, locking, and structured logging:

#!/usr/bin/env bash
# parallel-deploy.sh — deploy a list of services in parallel
set -Eeuo pipefail

source "$(dirname "${BASH_SOURCE[0]}")/lib/log.sh"

LOCKFILE=/var/run/parallel-deploy.lock
exec 200>"$LOCKFILE"
flock -n 200 || { error "another deploy is running"; exit 1; }

[[ $# -ge 1 ]] || { error "usage: $0 <services-file> [tag]"; exit 2; }
SVC_FILE=$1
TAG="${2:-latest}"
JOBS="${JOBS:-8}"

[[ -r "$SVC_FILE" ]] || { error "cannot read $SVC_FILE"; exit 1; }

readarray -t SERVICES < "$SVC_FILE"
info "deploying ${#SERVICES[@]} services with concurrency=$JOBS, tag=$TAG"

mkdir -p /tmp/deploy-results
RESULTS_DIR=$(mktemp -d /tmp/deploy.XXXXXX)
trap 'rm -rf "$RESULTS_DIR"' EXIT

deploy_one() {
  local svc=$1
  local logfile="$RESULTS_DIR/$svc.log"

  # 3 retries with exponential backoff
  local attempt
  for attempt in 1 2 3; do
    info "deploy $svc (attempt $attempt)"
    if kubectl set image "deployment/$svc" "$svc=ghcr.io/myorg/$svc:$TAG" \
       && kubectl rollout status "deployment/$svc" --timeout=2m \
       > "$logfile" 2>&1; then
      info "deploy $svc OK"
      return 0
    fi
    warn "deploy $svc attempt $attempt failed; sleeping $((attempt * 5))s"
    sleep $((attempt * 5))
  done

  error "deploy $svc FAILED after 3 attempts"
  return 1
}

export -f deploy_one
export TAG RESULTS_DIR

printf '%s\n' "${SERVICES[@]}" | parallel -j "$JOBS" --joblog "$RESULTS_DIR/joblog" \
  --halt soon,fail=10% deploy_one {}
RC=$?

# Summarise results
SUCCESS=$(awk -F'\t' 'NR>1 && $7==0 { c++ } END { print c+0 }' "$RESULTS_DIR/joblog")
FAILED=$(awk -F'\t' 'NR>1 && $7!=0 { c++ } END { print c+0 }' "$RESULTS_DIR/joblog")

info "deploy summary" total=${#SERVICES[@]} success=$SUCCESS failed=$FAILED

[[ $RC -eq 0 ]] || error "some deploys failed; see $RESULTS_DIR for details"
exit $RC

Notes:

flock ensures only one deploy can run at a time across the whole machine.
parallel -j $JOBS caps concurrency.
--halt soon,fail=10% aborts the whole batch if 10%+ of jobs fail (don’t keep deploying after a clear pattern of failure).
Each job’s stdout/stderr goes to its own file under RESULTS_DIR.
The joblog (--joblog) gives us machine-parseable results we summarise via awk.
export -f deploy_one is needed for parallel to find the function in the child shells.
lib/log.sh from L15 provides info/warn/error.

This is what shipping shell scripts looks like at scale.

8. Common pitfalls

`wait` returning 127

If you call wait $PID for a PID that has already been reaped, wait returns 127. Avoid by always wait’ing exactly once per spawned PID.

`xargs -P` and signals

If you Ctrl-C xargs -P, it kills itself but its children may keep running. To propagate:

trap 'kill $(jobs -p) 2>/dev/null' INT TERM

Or use xargs --process-slot-var and arrange for children to exit on signal.

`parallel` lecture on first run

The first time you run parallel, it asks you to “cite” via parallel --citation. To skip in scripts:

parallel --will-cite ...

Or run parallel --citation manually once to suppress the prompt forever.

Background jobs killed when parent exits

By default, when a script exits, its background jobs receive SIGHUP (well, terminal-related ones do). Use nohup or disown for jobs you want to outlive the script:

nohup long_thing &           # ignores SIGHUP, redirects stdout/err
disown $!                    # remove from shell's job table

`&` inside a function vs at script top level

my_func() {
  cmd &           # PID is in $! INSIDE the function only
}
my_func
echo "$!"         # NOT the PID of cmd; it's the PID of the LAST background launched at this scope

If you want a function to launch and return the PID, capture inside:

my_func() {
  cmd &
  echo $!
}
PID=$(my_func)
wait "$PID"

FIFO blocking forever

A FIFO write blocks until a reader opens it. If your reader exited or never started, the writer hangs. Fix: ensure both ends are open, or use exec 3<>FIFO to keep an FD open in the script itself so neither end “closes.”

`flock` on NFS

flock doesn’t work reliably across NFS — different kernels handle remote locks differently. For NFS, use lockfile-create (procmail) or rely on atomic ln (hard links are atomic on most filesystems).

9. Twelve idioms for daily use

# 1. Run three commands in parallel, wait for all
cmd1 & cmd2 & cmd3 & wait

# 2. xargs job pool over a list
find . -name '*.log' -print0 | xargs -0 -P 8 -n 1 gzip

# 3. Number of cores cross-platform
NPROC=$(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)

# 4. parallel basic
parallel -j 8 'curl -s https://api.example.com/{}' ::: 1 2 3 4 5

# 5. parallel with retry + joblog
parallel --joblog jobs.log --retries 3 'cmd {}' ::: ...

# 6. parallel with progress
parallel --bar 'cmd {}' ::: input*

# 7. Single-instance via flock (non-blocking)
exec 200>/var/run/myscript.lock; flock -n 200 || exit 0

# 8. Atomic create-if-not-exists (no race)
( set -C; echo "$$" > "$LOCK" ) 2>/dev/null

# 9. FIFO-based semaphore worker pool
mkfifo "$FIFO"; exec 3<>"$FIFO"
for i in $(seq 1 4); do echo >&3; done
for item in "${ITEMS[@]}"; do
  read -u 3
  ( do_work "$item"; echo >&3 ) &
done
wait

# 10. Per-job result files (no interleaving)
job() { local id=$1; do_work > "results/$id.out"; }
for id in 1 2 3; do job $id & done; wait

# 11. wait -n for any-finishes (bash 4.3+)
while (( ${#JOBS[@]} > 0 )); do wait -n; done

# 12. Disown a long-running job from the shell
nohup long_job & disown $!

10. What you must internalise before lesson 17

What’s wait -n for? (Wait for ANY background job to finish — bash 4.3+.)
What’s xargs -P 4 -n 1? (Run 4 in parallel, 1 input arg per command.)
What’s the -0 flag’s purpose? (NUL-separated input — paired with find -print0 for filename safety.)
What does parallel --joblog give you? (A tab-separated file with input, exit code, time, retries — machine-parseable results.)
What does parallel --halt soon,fail=10% do? (Stop launching new jobs as soon as 10% of jobs have failed.)
What’s a FIFO and how do you create one? (mkfifo /tmp/fifo — a filesystem-named pipe.)
What’s flock -n? (Non-blocking lock acquisition — fails immediately if already held.)
Why use flock instead of [[ -f $LOCKFILE ]]? (flock uses kernel-managed locks that auto-release on process death; no staleness.)
What’s a TOCTOU race? (Time-Of-Check, Time-Of-Use — the gap between checking a condition and acting on it allows another process to change state in between.)
What’s the safest pattern for accumulating results from parallel jobs? (Each job writes to its own file; aggregate after wait.)

What’s next

Lesson 17: Network Operations — curl/wget Mastery, /dev/tcp Sockets, Retry-with-Backoff & Idempotent HTTP. Almost every modern script makes HTTP calls — to APIs, to artifact registries, to webhooks. We’ll cover curl (every flag worth knowing), wget (when and why), bash’s built-in /dev/tcp socket support (no curl needed!), retry-with-exponential-backoff patterns, idempotency keys for safe API calls, and the canonical “wait for service to be up” pattern. After L17, your scripts will hit the network reliably.

See you there.