Shell Cloud CLIs Mastery: AWS, Azure, GCP — Auth Chains, Pagination, Parallel Calls, Output Discipline & Rate Limits

Why Cloud CLI Automation From Shell Has a Specific Set of Failure Modes

Your CI script lists S3 buckets across 12 regions, tags each one, runs nightly. It worked for six months. Then one night it ran 47 minutes instead of 4, you got a ThrottlingException from AWS, the script half-finished, and the next morning half your buckets are tagged and the other half aren’t.

Or: a deploy script reads gcloud auth list to confirm the right service account is active, then calls gcloud compute instances list. CI changed its environment, the active account is now default-runner@.. instead of deploy@.., and the script lists the wrong project’s instances and continues happily.

Or: an az vm list returns 2000 VMs across subscriptions, you forget pagination, the response is 80 MB, your jq pipeline OOMs the runner.

Cloud CLI automation has six specific failure modes:

Failure mode	Symptom	Cost
Wrong credential resolved	Script operates on wrong account/project	Wrong resource modified, or auth fails confusingly
Unpaginated listing	Truncated results, “missing” resources	Half-tagged buckets, half-deleted instances
Rate limit hit	`Throttling` / `429` errors mid-run	Partial state, retries that re-do work
Wrong output format	Brittle parsers break on text/table output	Script fails with no clear error
Long-running session timeout	Token expires mid-paginate	“Unauthorized” 30 minutes into a 60-minute job
Region/zone defaulting	Script runs in default region; your resource is elsewhere	“Resource not found” in a region that has 0 of them

This lesson is the cross-cloud pattern set. We treat AWS, Azure, and GCP as variations on the same problem, with a lib/cloud.sh that abstracts away the dialect differences.

The Credential Resolution Chain (All Three Clouds)

Every cloud CLI resolves credentials via a chain: it tries each source in order, stops at the first that yields valid credentials. Knowing the chain lets you reason about what credential is actually being used.

AWS credential chain

1. Environment vars: AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY (+ AWS_SESSION_TOKEN for STS)
2. AWS_PROFILE → ~/.aws/credentials [profile] section, plus ~/.aws/config
3. AWS_PROFILE with sso_session in ~/.aws/config → SSO cache in ~/.aws/sso/cache/*.json
4. AWS_WEB_IDENTITY_TOKEN_FILE + AWS_ROLE_ARN → STS AssumeRoleWithWebIdentity (IRSA, GitHub OIDC)
5. ECS task role (AWS_CONTAINER_CREDENTIALS_RELATIVE_URI)
6. EC2 IMDSv2 (instance metadata at http://169.254.169.254/latest/meta-data/iam/...)

The chain stops at the first source. If AWS_ACCESS_KEY_ID is set in your env, no other source is consulted, even if ~/.aws/credentials has different keys for the same profile. This is the most common confusion.

Azure credential chain (`az` CLI)

1. AZURE_CLIENT_ID + AZURE_CLIENT_SECRET + AZURE_TENANT_ID (service principal)
2. AZURE_CLIENT_ID + AZURE_USERNAME + AZURE_PASSWORD (resource owner password — discouraged)
3. Managed identity (when running in Azure VM/Functions/AKS)
4. ~/.azure/azureProfile.json (persisted login from `az login`)

az login writes a token cache to ~/.azure/. CI tokens expire (60–90 minutes); you must refresh or use a service principal directly.

GCP credential chain (`gcloud`)

1. GOOGLE_APPLICATION_CREDENTIALS → JSON key file path
2. gcloud config config-helper / active gcloud account
3. Compute metadata service (running on GCE/GKE)

For ADC (Application Default Credentials, used by SDKs): gcloud auth application-default login for humans; service-account key file for automation.

The “is the right credential active?” check

Always run a “who am I?” probe at the top of any script that uses cloud APIs:

# AWS
aws sts get-caller-identity --output json
# {
#   "UserId": "AIDAEXAMPLE",
#   "Account": "123456789012",
#   "Arn": "arn:aws:iam::123456789012:user/build-bot"
# }

# Azure
az account show -o json
# {
#   "id": "00000000-0000-0000-0000-000000000000",
#   "user": { "name": "build-bot@example.onmicrosoft.com", "type": "servicePrincipal" }
# }

# GCP
gcloud config list --format=json
# {
#   "core": {
#     "account": "build-bot@my-project.iam.gserviceaccount.com",
#     "project": "my-project"
#   }
# }

A 5-line preflight at the top of every cloud script:

preflight_aws() {
  local got_account got_arn want_account="${1:-}"
  read -r got_account got_arn < <(aws sts get-caller-identity --query '[Account,Arn]' --output text)
  echo "AWS account: $got_account, identity: $got_arn"
  if [[ -n "$want_account" && "$got_account" != "$want_account" ]]; then
    echo "ERROR: expected account $want_account, got $got_account" >&2
    exit 1
  fi
}

preflight_aws 123456789012   # fail-fast if wrong account is active

This is the single highest-leverage pattern in this lesson. Production catastrophes from “wrong account active” are entirely preventable.

Output Format Discipline: Always JSON for Automation

Every cloud CLI defaults to a human-readable format and offers JSON for automation:

CLI	Default	JSON	Set globally
`aws`	JSON (most regions) or table	`--output json`	`export AWS_DEFAULT_OUTPUT=json` or `~/.aws/config`
`az`	JSON	`-o json` (default)	`az config set core.output=json`
`gcloud`	YAML/text	`--format=json`	`gcloud config set core/format json`

Rule: always pass --output json (or equivalent) explicitly. Even if it’s the default, future versions or different operator profiles can change the default and your script will silently start emitting tables that break parsers.

# WRONG: relies on default; breaks if user has 'output=table' set.
aws ec2 describe-instances | grep i-

# RIGHT: explicit format, parse with jq.
aws ec2 describe-instances --output json \
  | jq -r '.Reservations[].Instances[].InstanceId'

Use server-side filtering when available

Both AWS (--query) and Azure (--query) accept JMESPath; gcloud uses --filter and --format. Server-side filtering is faster and avoids paginating data you’ll throw away.

# AWS: server-side filter for running instances in a VPC, project just IDs.
aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=running" "Name=vpc-id,Values=vpc-12345" \
  --query 'Reservations[].Instances[].InstanceId' \
  --output json

# Azure: same idea with --query.
az vm list --query "[?powerState=='VM running'].name" -o json

# GCP: --filter is GCE-style filter syntax; --format=value(...) for tab output.
gcloud compute instances list \
  --filter='status:RUNNING AND zone:us-central1-a' \
  --format='value(name)'

--query and --filter reduce the API response size (often 100x), reducing pagination, throttling risk, and parse time.

Pagination: The Single Biggest Source of Silent Truncation

By default, all three CLIs paginate, but they paginate differently:

AWS pagination

aws CLI v2 automatically paginates and concatenates results into a single response — unless you pass --max-items or set pagination config. So aws ec2 describe-instances on a 5,000-instance account returns all 5,000 in one response (which can be 30+ MB). Two failure modes:

Memory blowup in subsequent jq pipelines.
Timeout if the API call takes >60s due to size.

To process incrementally:

# Paginate manually with --no-paginate + NextToken handling.
list_all_instances() {
  local token result_file=/tmp/instances.jsonl
  : > "$result_file"

  while :; do
    local out
    if [[ -z "${token:-}" ]]; then
      out=$(aws ec2 describe-instances --no-paginate --output json)
    else
      out=$(aws ec2 describe-instances --no-paginate --starting-token "$token" --output json)
    fi

    # Append IDs to the result file (process incrementally).
    jq -r '.Reservations[].Instances[].InstanceId' <<<"$out" >> "$result_file"

    token=$(jq -r '.NextToken // empty' <<<"$out")
    [[ -z "$token" ]] && break
  done
}

Or, use --max-items (per-request page size; CLI auto-paginates) for clarity:

# Smaller pages mean faster first-byte and bounded memory per page.
aws ec2 describe-instances --max-items 100 --output json \
  | jq -r '.Reservations[].Instances[].InstanceId'

Azure pagination

az CLI auto-paginates by default. To control:

# Disable auto-pagination (return only the first page + NextToken-equivalent).
az vm list --max-items 1000 -o json

Most az commands support --top for “max results” but the default is “all results.”

GCP pagination

gcloud paginates differently per command. gcloud compute instances list paginates by default; --limit N caps results; --page-size N controls page size.

gcloud compute instances list --limit 5000 --page-size 500 --format=json

For very large lists, gcloud also supports --uri for “just give me the URLs” — light and fast.

Retry and Backoff: The Must-Have Wrapper

Every cloud API rate-limits. AWS surfaces ThrottlingException, Throttling, RequestLimitExceeded. Azure uses HTTP 429 with Retry-After. GCP uses 429 with quota errors.

Always wrap cloud CLI calls in retry-with-backoff. The CLIs sometimes have built-in retry but the defaults are conservative; explicit retry gives you control and visibility.

# Generic retry with exponential backoff.
cloud_retry() {
  local max=${CLOUD_RETRY_MAX:-5}
  local base=${CLOUD_RETRY_BASE_MS:-500}
  local attempt=1 wait_ms

  while (( attempt <= max )); do
    if "$@"; then
      return 0
    fi
    local rc=$?
    if (( attempt == max )); then
      echo "cloud_retry: giving up after $max attempts" >&2
      return "$rc"
    fi
    # Exponential backoff with jitter: base * 2^(attempt-1) ± 25%.
    wait_ms=$(( base * (2 ** (attempt - 1)) ))
    wait_ms=$(( wait_ms + (RANDOM % (wait_ms / 2)) - (wait_ms / 4) ))
    echo "attempt $attempt failed (rc=$rc); retrying in ${wait_ms}ms" >&2
    sleep "$(awk "BEGIN { printf \"%.3f\", $wait_ms / 1000 }")"
    attempt=$(( attempt + 1 ))
  done
}

# Usage:
cloud_retry aws s3api put-bucket-tagging --bucket my-bucket --tagging file://tags.json

For AWS, you can also leverage built-in retry config:

# In ~/.aws/config or env:
export AWS_RETRY_MODE=adaptive       # legacy | standard | adaptive (recommended)
export AWS_MAX_ATTEMPTS=10

adaptive mode uses a token bucket that adjusts based on observed throttling — strictly better than fixed backoff for steady-state operation.

Bounded Parallelism: GNU parallel + xargs Patterns

You have 200 buckets to tag. Sequential = 200 × 200ms = 40s. Parallel with concurrency 10 = 4s. Parallel with no limit = throttled.

# WRONG: unbounded parallel; trips throttling.
aws s3api list-buckets --query 'Buckets[].Name' --output text \
  | tr '\t' '\n' \
  | xargs -P 0 -I {} aws s3api put-bucket-tagging --bucket {} --tagging file://tags.json

# RIGHT: bounded concurrency. -P 10 = 10 parallel workers.
aws s3api list-buckets --query 'Buckets[].Name' --output text \
  | tr '\t' '\n' \
  | xargs -P 10 -I {} cloud_retry aws s3api put-bucket-tagging --bucket {} --tagging file://tags.json

For more sophisticated patterns, GNU parallel:

# Process per-region with up to 5 parallel jobs, retry on failure.
aws ec2 describe-regions --query 'Regions[].RegionName' --output text \
  | tr '\t' '\n' \
  | parallel -j 5 --retries 3 'aws --region {} s3api list-buckets --query "Buckets[].Name" --output json'

--retries 3 retries on non-zero exit (basic; not throttle-aware). For throttle-aware retry, wrap your own retry function in the parallel command.

Concurrency limit by API quota

AWS rate limits per-region per-API. Roughly: list APIs 100 req/s, mutating APIs 5–20 req/s. Keeping -P 10 for read-only listings is safe; for mutating, use -P 4 and add retries.

Azure rate limits per-subscription with reads at ~12,000/hour and writes at ~1,200/hour. Bursts of 100+ in seconds will hit throttling.

GCP rate limits per-project per-API; quotas are visible in gcloud compute project-info describe. For most CLI operations, -P 8 with retries is a safe baseline.

A Drop-In Library: `lib/cloud.sh`

# lib/cloud.sh — cross-cloud helpers. Detects which CLI to use; wraps retry.

# ─── Configuration ─────────────────────────────────────────────────────────
: "${CLOUD_RETRY_MAX:=5}"
: "${CLOUD_RETRY_BASE_MS:=500}"
: "${CLOUD_PARALLEL:=8}"

# ─── Retry with exponential backoff + jitter ──────────────────────────────
cloud_retry() {
  local max="$CLOUD_RETRY_MAX" base="$CLOUD_RETRY_BASE_MS" attempt=1 wait_ms rc
  while (( attempt <= max )); do
    "$@" && return 0
    rc=$?
    (( attempt == max )) && return "$rc"
    wait_ms=$(( base * (2 ** (attempt - 1)) ))
    wait_ms=$(( wait_ms + (RANDOM % (wait_ms / 2)) - (wait_ms / 4) ))
    echo "[cloud_retry] attempt $attempt failed (rc=$rc); sleep ${wait_ms}ms" >&2
    sleep "$(awk "BEGIN { printf \"%.3f\", $wait_ms / 1000 }")"
    attempt=$(( attempt + 1 ))
  done
}

# ─── Identity preflight (call at top of every cloud script) ────────────────
aws_whoami() {
  aws sts get-caller-identity --query '[Account,Arn]' --output text
}

aws_assert_account() {
  local want="$1" got
  got=$(aws sts get-caller-identity --query 'Account' --output text)
  if [[ "$got" != "$want" ]]; then
    echo "AWS account mismatch: want=$want got=$got" >&2
    return 1
  fi
}

az_assert_subscription() {
  local want="$1" got
  got=$(az account show --query 'id' -o tsv)
  if [[ "$got" != "$want" ]]; then
    echo "Azure subscription mismatch: want=$want got=$got" >&2
    return 1
  fi
}

gcp_assert_project() {
  local want="$1" got
  got=$(gcloud config get-value project 2>/dev/null)
  if [[ "$got" != "$want" ]]; then
    echo "GCP project mismatch: want=$want got=$got" >&2
    return 1
  fi
}

# ─── Pagination wrappers ───────────────────────────────────────────────────

# AWS: paginated foreach. Calls $func once per page.
aws_paginate() {
  local cmd_func="$1"; shift
  local token=""
  while :; do
    local args=()
    [[ -n "$token" ]] && args+=("--starting-token" "$token")
    local out
    out=$(cloud_retry "$@" --no-paginate --output json "${args[@]}") || return 1
    "$cmd_func" "$out" || return 1
    token=$(jq -r '.NextToken // empty' <<<"$out")
    [[ -z "$token" ]] && break
  done
}

# ─── Bounded parallel apply ────────────────────────────────────────────────

# Read items from stdin, apply $cmd to each, max $CLOUD_PARALLEL concurrent.
cloud_parallel() {
  xargs -P "$CLOUD_PARALLEL" -I {} bash -c "$(declare -f cloud_retry); cloud_retry $* {}"
}

# ─── Multi-region foreach (AWS) ────────────────────────────────────────────
aws_foreach_region() {
  local cmd_func="$1"
  local regions
  regions=$(aws ec2 describe-regions \
    --query 'Regions[].RegionName' --output text)
  local region
  for region in $regions; do
    AWS_REGION="$region" "$cmd_func"
  done
}

# Parallel multi-region.
aws_foreach_region_parallel() {
  local cmd_func="$1"
  aws ec2 describe-regions --query 'Regions[].RegionName' --output text \
    | tr '\t' '\n' \
    | xargs -P "$CLOUD_PARALLEL" -I {} bash -c "AWS_REGION={} $(declare -f cloud_retry $cmd_func); $cmd_func"
}

# ─── Output validation ─────────────────────────────────────────────────────
require_jq() { command -v jq >/dev/null || { echo "jq required" >&2; exit 1; }; }

# Validate that JSON output has expected shape.
require_json_field() {
  local input="$1" field="$2"
  if ! jq -e "$field" <<<"$input" >/dev/null 2>&1; then
    echo "missing required field: $field" >&2
    return 1
  fi
}

Real-World Recipes

Recipe 1: Inventory all S3 buckets across all regions

. lib/cloud.sh
aws_assert_account 123456789012   # preflight

list_buckets_in_region() {
  local region="$AWS_REGION"
  aws --region "$region" s3api list-buckets --output json \
    | jq -r --arg r "$region" '.Buckets[] | "\($r)\t\(.Name)\t\(.CreationDate)"'
}

aws_foreach_region_parallel list_buckets_in_region | sort > all_buckets.tsv
echo "found $(wc -l < all_buckets.tsv) buckets"

Recipe 2: Tag all running EC2 instances missing a `Owner` tag

list_untagged_running() {
  aws ec2 describe-instances \
    --filters "Name=instance-state-name,Values=running" \
    --query 'Reservations[].Instances[?!not_null(Tags[?Key==`Owner`].Value | [0])].InstanceId' \
    --output text \
    | tr '\t' '\n'
}

tag_instance() {
  local id="$1"
  cloud_retry aws ec2 create-tags --resources "$id" \
    --tags Key=Owner,Value=unknown
  echo "tagged $id"
}
export -f cloud_retry tag_instance

list_untagged_running \
  | xargs -P 10 -I {} bash -c 'tag_instance {}'

The export -f pattern is needed so child shells (spawned by xargs) can see the function. Alternative: use parallel.

Recipe 3: Multi-cloud secret rotation (AWS Secrets Manager + Azure Key Vault)

rotate_aws_secret() {
  local name="$1"
  local new_value
  new_value=$(openssl rand -base64 32)
  cloud_retry aws secretsmanager update-secret \
    --secret-id "$name" \
    --secret-string "$new_value"
  echo "rotated AWS secret: $name"
}

rotate_az_secret() {
  local vault="$1" name="$2"
  local new_value
  new_value=$(openssl rand -base64 32)
  cloud_retry az keyvault secret set \
    --vault-name "$vault" --name "$name" --value "$new_value" \
    -o none
  echo "rotated Azure secret: $vault/$name"
}

rotate_aws_secret "myapp/db-password"
rotate_az_secret  "myapp-vault" "db-password"

Recipe 4: Cost-attribution: spend per tag for last month

# AWS Cost Explorer.
aws ce get-cost-and-usage \
  --time-period "Start=$(date -d 'first day of last month' +%F),End=$(date -d 'first day of this month' +%F)" \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=TAG,Key=CostCenter \
  --output json \
  | jq -r '
    .ResultsByTime[].Groups[]
    | "\(.Keys[0])\t\(.Metrics.BlendedCost.Amount)\t\(.Metrics.BlendedCost.Unit)"
  ' \
  | sort -t$'\t' -k2 -n -r

Recipe 5: Drift check: declared vs actual VM count

# Fail CI if a Terraform-managed deployment has unexpected drift.
expected=$(terraform output -json instance_ids | jq -r '.[]' | sort)
actual=$(aws ec2 describe-instances \
  --filters "Name=tag:ManagedBy,Values=terraform" "Name=instance-state-name,Values=running" \
  --query 'Reservations[].Instances[].InstanceId' --output text \
  | tr '\t' '\n' | sort)

if ! diff <(echo "$expected") <(echo "$actual") >/dev/null; then
  echo "DRIFT detected:"
  diff <(echo "$expected") <(echo "$actual")
  exit 1
fi

Footgun List

Default region is implicit. AWS has AWS_REGION / AWS_DEFAULT_REGION / profile region / fallback to us-east-1. Always set it explicitly: aws --region us-west-2 ... or export AWS_REGION=us-west-2.
AWS_PROFILE does not override AWS_ACCESS_KEY_ID. If the env vars are set, the profile is ignored. Common when CI sets keys for one account and you aws --profile other thinking it switches.
Output format is per-call, not session. --output json on one call doesn’t apply to the next. Set AWS_DEFAULT_OUTPUT=json for the session.
jq -r strips quotes from null — outputs the literal string “null”. Filter: jq -r '. // empty' produces empty for null.
aws s3 ls is not the same as aws s3api list-objects-v2. The first uses the Recursive CLI, the second is the raw API. Different output formats, different pagination behavior.
gcloud compute is regional/zonal. Without --zone or --region, gcloud often errors or asks interactively. In scripts, always specify.
az login writes credentials to a global file. Two scripts running concurrently can race on token refresh. Use service-principal env vars for parallel automation.
AWS_PAGER=cat disables CLI v2’s auto-pager (which breaks scripts on TTY-detection). Set this in CI: export AWS_PAGER="".
gcloud auth print-access-token returns a token but gcloud may not refresh it automatically; long-running scripts can have the token expire.
Rate limits are per region for AWS and per subscription for Azure. Hitting throttling in one region doesn’t necessarily fail you in another, but if you parallelize across regions, you can hit per-account limits too.
aws s3 sync and gsutil rsync do their own retry logic that you can’t easily inspect. For reliable transfers at scale, prefer dedicated tools (or wrap with cloud_retry around aws s3 cp for fine-grained control).
Tagging APIs are eventually consistent. Tag a resource, immediately list-by-tag — the list may not include the freshly tagged resource for several seconds. Don’t rely on read-your-writes.

Quick-Reference Card

┌─ CREDENTIAL CHAIN PRIORITY ───────────────────────────────────────────┐
│  AWS:    env > profile > SSO > IRSA > ECS task > IMDSv2              │
│  Azure:  service-principal env > managed identity > az login cache   │
│  GCP:    GOOGLE_APPLICATION_CREDENTIALS > gcloud account > metadata  │
└────────────────────────────────────────────────────────────────────────┘

┌─ PREFLIGHT (RUN AT TOP OF EVERY CLOUD SCRIPT) ────────────────────────┐
│  aws sts get-caller-identity                                         │
│  az account show                                                     │
│  gcloud config list                                                  │
│  Assert account/subscription/project matches expected                │
└────────────────────────────────────────────────────────────────────────┘

┌─ OUTPUT FORMAT (ALWAYS EXPLICIT) ─────────────────────────────────────┐
│  aws ... --output json                                               │
│  az ... -o json                                                      │
│  gcloud ... --format=json   (or --format='value(field)' for tab)    │
│  Pipe to jq for transformations                                      │
└────────────────────────────────────────────────────────────────────────┘

┌─ SERVER-SIDE FILTERING ───────────────────────────────────────────────┐
│  aws --filters "Name=tag:Env,Values=prod"                            │
│  aws --query 'Reservations[].Instances[].[InstanceId,Tags]'          │
│  az --query "[?location=='eastus'].name"                             │
│  gcloud --filter='status:RUNNING' --format='value(name)'             │
└────────────────────────────────────────────────────────────────────────┘

┌─ PAGINATION ──────────────────────────────────────────────────────────┐
│  AWS:    --max-items N (auto-paginates) or --no-paginate + token loop│
│  Azure:  --max-items N (auto-paginates by default)                   │
│  GCP:    --limit N --page-size N                                     │
│  ALWAYS process incrementally; large unpaginated responses OOM       │
└────────────────────────────────────────────────────────────────────────┘

┌─ RETRY & BACKOFF ─────────────────────────────────────────────────────┐
│  AWS:    AWS_RETRY_MODE=adaptive AWS_MAX_ATTEMPTS=10                 │
│  Generic: cloud_retry wrapper with exponential + jitter              │
│  Always handle: ThrottlingException, 429, RequestLimitExceeded       │
└────────────────────────────────────────────────────────────────────────┘

┌─ PARALLELISM ─────────────────────────────────────────────────────────┐
│  Read-only listing:  -P 10 (xargs)                                   │
│  Mutating ops:       -P 4 with retry                                 │
│  Multi-region:       parallel -j 5 over regions                      │
│  Per-region: rate limits separate; parallelize by region for scale   │
└────────────────────────────────────────────────────────────────────────┘

┌─ CI-SAFE ENV ─────────────────────────────────────────────────────────┐
│  export AWS_PAGER=""               disable v2 pager (breaks no-TTY)  │
│  export AWS_DEFAULT_OUTPUT=json     consistent format                │
│  export AWS_REGION=us-west-2        explicit region                  │
│  export AWS_RETRY_MODE=adaptive     better throttle handling         │
│  unset AWS_PROFILE if using env keys (prevent profile interference)  │
└────────────────────────────────────────────────────────────────────────┘

What’s Next

Cloud CLIs are the operator’s keyboard for the platform. The next layer wraps your shell scripts as proper Linux services, integrated with the system: timer-based scheduling, restart-on-failure, watchdogs, logging integration. The next lesson, Writing systemd Units That Wrap Shell Scripts Properly: Type, Restart, Hardening, Watchdogs, covers the Unit/Service/Timer file structure, choosing Type=simple vs oneshot vs notify, sandboxing with ProtectSystem and PrivateTmp, watchdog integration, and the difference between “the script ran” and “the service is healthy.”

Shell Cloud CLIs Mastery: AWS, Azure, GCP — Auth Chains, Pagination, Parallel Calls, Output Discipline & Rate Limits

Why Cloud CLI Automation From Shell Has a Specific Set of Failure Modes

The Credential Resolution Chain (All Three Clouds)

AWS credential chain

Azure credential chain (`az` CLI)

GCP credential chain (`gcloud`)

The “is the right credential active?” check

Output Format Discipline: Always JSON for Automation

Use server-side filtering when available

Pagination: The Single Biggest Source of Silent Truncation

AWS pagination

Azure pagination

GCP pagination

Retry and Backoff: The Must-Have Wrapper

Bounded Parallelism: GNU parallel + xargs Patterns

Concurrency limit by API quota

A Drop-In Library: `lib/cloud.sh`

Real-World Recipes

Recipe 1: Inventory all S3 buckets across all regions

Recipe 2: Tag all running EC2 instances missing a `Owner` tag

Recipe 3: Multi-cloud secret rotation (AWS Secrets Manager + Azure Key Vault)

Recipe 4: Cost-attribution: spend per tag for last month

Recipe 5: Drift check: declared vs actual VM count

Footgun List

Quick-Reference Card

What’s Next

Written by Vinod

Comments

Shell Cloud CLIs Mastery: AWS, Azure, GCP — Auth Chains, Pagination, Parallel Calls, Output Discipline & Rate Limits

Why Cloud CLI Automation From Shell Has a Specific Set of Failure Modes

The Credential Resolution Chain (All Three Clouds)

AWS credential chain

Azure credential chain (az CLI)

GCP credential chain (gcloud)

The “is the right credential active?” check

Output Format Discipline: Always JSON for Automation

Use server-side filtering when available

Pagination: The Single Biggest Source of Silent Truncation

AWS pagination

Azure pagination

GCP pagination

Retry and Backoff: The Must-Have Wrapper

Bounded Parallelism: GNU parallel + xargs Patterns

Concurrency limit by API quota

A Drop-In Library: lib/cloud.sh

Real-World Recipes

Recipe 1: Inventory all S3 buckets across all regions

Recipe 2: Tag all running EC2 instances missing a Owner tag

Recipe 3: Multi-cloud secret rotation (AWS Secrets Manager + Azure Key Vault)

Recipe 4: Cost-attribution: spend per tag for last month

Recipe 5: Drift check: declared vs actual VM count

Footgun List

Quick-Reference Card

What’s Next

Written by Vinod

Comments

Azure credential chain (`az` CLI)

GCP credential chain (`gcloud`)

A Drop-In Library: `lib/cloud.sh`

Recipe 2: Tag all running EC2 instances missing a `Owner` tag