You have 400 GB of nightly database exports on a build server that must reach an Azure storage account before the 6 a.m. reporting job runs. You try the portal’s drag-and-drop upload, the browser tab freezes around 2 GB, and at file 1,180 of 4,000 your Wi-Fi blips and the whole thing restarts from zero. This is the exact problem AzCopy solves: a single, self-contained command-line tool from Microsoft that moves large amounts of data into, out of and between Azure storage endpoints — Blob storage and Azure Files — fast, in parallel, and with the ability to resume a half-finished job instead of restarting it. You point it at a source and a destination, pick one of two verbs — copy (put these bytes there) or sync (make the destination match the source) — and it streams the data over many concurrent connections, tracking every file so a network drop or a Ctrl-C is recoverable with azcopy jobs resume. That resume, plus the parallelism, is why teams reach for it the moment a transfer is measured in gigabytes rather than megabytes.
By the end of this guide you will install AzCopy, authenticate it two ways (a short-lived SAS token and your Entra ID login), upload a folder to Blob storage, keep a folder in sync, download it back, kill a transfer mid-flight and resume it, then tear everything down — doing the upload three ways: the portal’s Storage browser, the az CLI that wraps AzCopy, and Bicep. You will also learn the flags that matter, the errors you hit on day one, and the mistakes that make a “simple copy” silently skip files or run at a tenth of the speed it should.
What problem this solves
Moving bytes into Azure storage sounds trivial until the numbers get real. The portal’s browser upload runs in a single tab, has no resume, and chokes on large counts or sizes. Generic tools — a mounted drive, scp, a script over the REST API — mean writing and debugging your own retry logic, parallelism and progress tracking, and getting it subtly wrong: a transfer that “finished” but quietly dropped 12 files on a transient 503 is far worse than one that loudly failed. Throughput matters too — a single-threaded copy crawls at 20 MB/s on a link concurrency would saturate at 400 MB/s.
Who hits this: anyone doing an initial data migration (on-prem shares, another cloud, a laptop of media); teams with recurring batch loads (nightly exports, log shipping, dataset refreshes); people moving data between accounts or regions; and developers pushing build artifacts or a static site from a pipeline. AzCopy is the default answer because it is fast, scriptable, resumable, and ships as one binary. Where it fits among the alternatives:
| Tool / method | Good for | Resume? | Parallel? | Scriptable? | When AzCopy wins |
|---|---|---|---|---|---|
| Portal upload (Storage browser) | A few small files, ad-hoc | No | Limited | No | Anything large, scripted, or resumable |
| AzCopy | Bulk copy/sync/move, migrations, pipelines | Yes | Yes (high) | Yes | This is the default for bulk transfer |
az storage blob upload |
One blob, simple scripts | No (single file) | No | Yes | Many files, folders, or large sizes |
| Storage Explorer (GUI app) | Visual browse + transfer | Partial | Yes | No | Headless servers, automation, pipelines |
| Azure Data Factory | Scheduled, managed pipelines, transforms | Yes (managed) | Yes | Yes (UI/JSON) | Simple one-off or terminal-driven copies |
(az storage copy wraps AzCopy with the same engine; the agent-based Azure Storage Mover targets large agent-driven on-prem migrations.)
Learning objectives
By the end of this article you can:
- Install AzCopy on Windows, macOS and Linux and verify the version.
- Explain the difference between
azcopy copyandazcopy syncand choose the right one for a task. - Authenticate AzCopy two ways — a SAS token appended to the URL and
azcopy loginwith your Entra ID identity — and say when to use each. - Upload a folder to Blob storage, download it back, and synchronise a local folder with a container, including the
--recursiveand--delete-destinationbehaviours. - Interrupt a transfer and resume it with
azcopy jobs resume, and inspect job history withazcopy jobs listandazcopy jobs show. - Tune the flags that matter — concurrency,
--overwrite,--include-pattern,--cap-mbps,--check-md5— and know their defaults. - Run the same upload three ways: the portal Storage browser, the
azCLI (which wraps AzCopy), and an IaC setup in Bicep that provisions the account it targets. - Diagnose the common day-one failures:
403authentication errors, “no files matched”, silent skips from overwrite rules, and throttling.
Prerequisites & where this fits
You need an Azure subscription, the ability to create (or access) a storage account, a terminal — Bash, PowerShell, or Cloud Shell — and basic comfort with command-line arguments and environment variables. No programming language is required; AzCopy is just commands. Familiarity with a container and a blob helps; if those are new, read Azure Storage Account Fundamentals first, because AzCopy assumes you already have somewhere to put the data.
This sits at the very start of the Storage track. AzCopy is the transfer layer: it does not decide where your data lives (that is the access-tier and redundancy choice in Azure Blob Access Tiers: Hot, Cool, Cold & Archive and Azure Storage Redundancy: LRS, ZRS, GRS & RA-GRS), and it does not secure the account. It moves bytes between endpoints you have already provisioned and secured — so when a transfer is denied, the cause is almost always one of those upstream layers, which is why this guide leans on the Troubleshooting Azure Storage 403s playbook for authentication failures.
Core concepts
A few ideas make every later command obvious.
AzCopy is a single self-contained binary, and everything is a source and a destination. There is no service or agent — you download one executable, put it on your PATH, run it, and it exits. The command shape is always azcopy <verb> <source> <destination> [flags], where each endpoint is a local path (/data/exports) or an Azure URL (https://<account>.blob.core.windows.net/<container>/<path>). It works local↔Azure both ways and Azure↔Azure between two endpoints — the latter runs server-side, so copying across accounts or regions barely touches your machine.
A transfer is a resumable “job”, run in parallel. Every invocation creates a job with a unique ID and a plan file in your .azcopy profile directory recording every file’s status; if the job is interrupted (network drop, Ctrl-C, a kill), azcopy jobs resume <id> picks up only the files that hadn’t completed. Meanwhile AzCopy splits the work across many concurrent connections (and large blobs into parallel chunks), scaling with your CPU count — why it is often 10–20× faster than a single-threaded copy.
The two verbs and two auth methods are the core choices. copy puts bytes there (overwriting by default); sync makes the destination match the source, moving only diffs and optionally deleting extras. For permission, AzCopy uses either a SAS (a scoped, time-limited string on the URL — good for sharing/pipelines) or azcopy login via Entra ID (RBAC roles, no secret on the URL — good for interactive and unattended use).
The vocabulary in one table
Pin down every term before the step-by-step sections; the glossary repeats these for lookup.
| Term | One-line definition | Where it shows up |
|---|---|---|
| AzCopy | Single-binary CLI tool for bulk Azure storage transfer | The command you run |
copy |
Verb: transfer source → destination (overwrites by default) | One-way pushes/pulls |
sync |
Verb: make destination match source (transfers only diffs) | Recurring refreshes |
| Job | One transfer invocation with a unique ID, recorded in a plan file | azcopy jobs ... |
| Resume | Continue an interrupted job from where it stopped | azcopy jobs resume <id> |
| SAS token | Scoped, time-limited access string on the URL | Auth without a login |
| Entra ID login | RBAC-based auth via azcopy login |
Interactive / managed identity |
| Concurrency | Number of parallel connections AzCopy uses | Throughput tuning |
Installing AzCopy
AzCopy ships as a small archive for each OS — no package to register, no admin install. Extract the binary and run it. The official download is a versioned ZIP/tar from Microsoft’s aka.ms redirector; pick your platform:
# Linux (x86-64) — download, extract, move the single binary onto PATH
curl -L https://aka.ms/downloadazcopy-v10-linux -o azcopy.tar.gz
tar -xf azcopy.tar.gz --strip-components=1
sudo mv azcopy /usr/local/bin/
azcopy --version
# macOS (Apple Silicon shown; use -mac for Intel)
curl -L https://aka.ms/downloadazcopy-v10-mac-arm64 -o azcopy.zip
unzip -o azcopy.zip
sudo mv azcopy_darwin_*/azcopy /usr/local/bin/
azcopy --version
# Windows (PowerShell) — download the ZIP, expand, and run from the folder
Invoke-WebRequest -Uri https://aka.ms/downloadazcopy-v10-windows -OutFile azcopy.zip
Expand-Archive azcopy.zip -DestinationPath .\azcopy -Force
.\azcopy\azcopy_windows_*\azcopy.exe --version
Expected output is a single line like azcopy version 10.x.y — that confirms the binary runs.
Already in Cloud Shell or on an Azure VM? AzCopy is pre-installed in Azure Cloud Shell — just type azcopy --version. The az CLI also bundles a copy and calls it under the hood for az storage copy. (Package managers like winget and Homebrew carry it for workstations you maintain.)
Keep it current. AzCopy v10 is the supported major version (v8/v7 are deprecated); new releases fix throughput regressions and add flags, so a stale binary is a common cause of “it’s slow” or “that flag doesn’t exist.” Re-download to upgrade (the direct-download method has no in-place updater).
Authenticating AzCopy
AzCopy must prove it is allowed to read the source and write the destination. The two everyday methods are SAS tokens and Entra ID login.
Option A — SAS token (scoped, self-contained)
A SAS (Shared Access Signature) is a signed query string granting specific permissions (read, write, list, delete) to specific resources for a limited time. You append it to the resource URL and AzCopy needs nothing else — ideal for pipelines and for handing someone temporary access without an Azure identity. Generate a container-scoped SAS with the az CLI:
# Account + container you already created
ACCOUNT=stdatalab$RANDOM
CONTAINER=uploads
# A SAS valid for 4 hours with read/write/list/create/delete on the container
EXPIRY=$(date -u -d "+4 hours" '+%Y-%m-%dT%H:%MZ' 2>/dev/null || date -u -v+4H '+%Y-%m-%dT%H:%MZ')
SAS=$(az storage container generate-sas \
--account-name "$ACCOUNT" --name "$CONTAINER" \
--permissions racwdl --expiry "$EXPIRY" \
--auth-mode login --as-user -o tsv)
echo "SAS generated (expires $EXPIRY)"
You then pass it by appending ?$SAS to the resource URL in the AzCopy command (shown in the lab). The permission letters map directly to what AzCopy can do:
| SAS permission letter | Grants | AzCopy needs it for |
|---|---|---|
r |
Read | Download (source), MD5 checks |
w |
Write | Upload (destination) |
c |
Create | Creating new blobs |
d |
Delete | sync --delete-destination, overwrite-as-replace |
l |
List | sync, recursive copy (enumerate the container) |
The most common SAS mistake is granting rw but not l (list) and then watching sync fail because it cannot enumerate the destination. For sync, racwdl is the safe set.
Option B — Entra ID login (RBAC, no secret on the URL)
azcopy login authenticates you via Entra ID, and AzCopy then authorises each transfer through your RBAC roles — nothing sensitive on the URL, access governed centrally. It is the better choice for interactive work and for automation running as a managed identity.
# Interactive device-code login (great for Cloud Shell / remote terminals)
azcopy login
# Then run transfers against the plain https URL — no SAS appended
azcopy copy '/data/exports' \
"https://$ACCOUNT.blob.core.windows.net/$CONTAINER" --recursive
The identity must hold a data-plane role on the account or container — not the management-plane “Owner”/“Contributor”, which (counter-intuitively) does not grant data access. Assign a storage data role such as Storage Blob Data Contributor with az role assignment create (the exact command is in lab Step 5).
azcopy login has a few modes: plain azcopy login (interactive device code), azcopy login --identity (a managed identity on a VM/App Service/pipeline — no secret to leak), and azcopy login --service-principal (headless automation with an app ID + secret/cert). Managed identity is the one to reach for in automation.
SAS vs Entra ID — which to use
| Dimension | SAS token | Entra ID login |
|---|---|---|
| Secret location | In the URL (scoped, time-limited) | None on the URL |
| Granularity | Per-resource, exact permissions | Per-RBAC-role on the scope |
| Expiry / revocation | Built-in expiry; revoke by rotating key | Revoke by removing the role |
| Best for | Pipelines, sharing, cross-account | Interactive, managed identity, audited access |
| Required role | None (the SAS is the grant) | A data-plane storage role |
| Day-one gotcha | Forgetting l (list) for sync |
Using Contributor (mgmt plane ≠ data plane) |
Rule of thumb: Entra ID for people and managed identities; SAS for a self-contained, shareable, expiring grant — e.g. a partner uploading into one container for two hours.
copy vs sync — choosing the verb
These two verbs cover almost everything you will do, and the choice (with its flags) is the difference between a correct, fast transfer and a surprising one.
azcopy copy
copy transfers the source to the destination; with --recursive it walks subdirectories and by default overwrites existing files. It is one-directional and unconditional — exactly what you want for an initial load or a one-off push.
# Upload a whole folder (and subfolders) to a container
azcopy copy '/data/exports' "https://$ACCOUNT.blob.core.windows.net/$CONTAINER" --recursive
azcopy sync
sync makes the destination match the source — comparing last-modified time and size, transferring only new or changed files, plus (with --delete-destination=true) removing destination files absent from the source. Because it moves only diffs, it is the verb for anything recurring: nightly refreshes, a current backup folder, a mirrored static site.
# Make the container exactly mirror the local folder (transfers only changes)
azcopy sync '/data/exports' "https://$ACCOUNT.blob.core.windows.net/$CONTAINER" --delete-destination=true
The decision, side by side
| Question | copy |
sync |
|---|---|---|
| Direction | One-way push/pull | One-way, but reconciles |
| Looks at destination contents? | Only for overwrite decision | Yes — compares both sides |
| Transfers unchanged files? | Yes (re-sends) | No (skips identical) |
| Can delete extra dest files? | No | Yes (--delete-destination=true) |
| Default overwrite | true |
n/a (diff-based) |
| Best for | Initial load, one-off, append | Nightly refresh, mirror, backup |
| Risk to watch | Re-sending everything (slow/costly) | --delete-destination wiping files |
In short: first-time bulk uploads, build-artifact pushes and account-to-account moves are copy; “keep a folder current” and “exact mirror” are sync. Either verb takes --include-pattern to filter by name.
The flags that actually matter
You will use a small set of flags constantly — what each does, its default, and when to touch it:
| Flag | What it does | Default | When to change |
|---|---|---|---|
--recursive |
Include subdirectories | false |
Always, for folder transfers |
--overwrite |
Replace existing dest files (true/false/prompt/ifSourceNewer) |
true |
false to skip existing; ifSourceNewer for incremental |
--include-pattern / --exclude-pattern |
Only/skip files matching a glob ("*.csv;*.json") |
(all) | Filter by name/extension |
--cap-mbps |
Throttle throughput to N megabits/s | (uncapped) | Protect a shared/metered link |
--block-blob-tier |
Set tier on upload (Hot/Cool/Cold/Archive) |
account default | Land data straight in Cool/Archive |
--put-md5 / --check-md5 |
Store an MD5 on upload / verify it on download | off / FailIfDifferent |
End-to-end integrity checks |
--dry-run |
Show what would transfer, move nothing | false |
Validate a sync/filter before running |
--log-level |
Verbosity (INFO/WARNING/ERROR/DEBUG) |
INFO |
DEBUG when diagnosing |
(--include-path/--exclude-path also filter by subfolder, and --from-to forces direction when endpoints are ambiguous.)
Two notes that save real time. First, --dry-run before any sync --delete-destination — it prints exactly which files would be sent and deleted, so you never discover a mistake after the fact. Second, concurrency is set by the AZCOPY_CONCURRENCY_VALUE environment variable (default AUTO, scaling with CPU); you rarely touch it, but export AZCOPY_CONCURRENCY_VALUE=16 eases a constrained link, and raising it helps a fat pipe with many small files. Pair either with --cap-mbps to bound bandwidth on a shared connection.
Jobs and resume — the reliability story
This is the feature that makes AzCopy trustworthy for large transfers. When you start a transfer, AzCopy prints a Job <guid> has started line and writes a plan file (and a log) under your profile directory — ~/.azcopy on Linux/macOS, %USERPROFILE%\.azcopy on Windows — listing every file and its status. If the job stops for any reason (Ctrl-C, a network drop, a reboot, transient failures), the completed files stay done and the plan records what’s left. You resume with the job ID:
# See recent jobs and their status
azcopy jobs list
# Resume an interrupted/failed job from where it stopped
azcopy jobs resume 11111111-2222-3333-4444-555555555555
# Inspect a single job: counts of completed / failed / skipped
azcopy jobs show 11111111-2222-3333-4444-555555555555
Resume reuses the plan (what to move), not the credentials (permission to move it): if the original SAS has expired, pass a fresh one with --source-sas/--destination-sas; an expired Entra ID session needs a fresh azcopy login. The job commands you will use:
| Command | What it does | Use when |
|---|---|---|
azcopy jobs list |
Lists recent jobs with status | “Which job do I resume?” |
azcopy jobs show <id> |
Counts: completed / failed / skipped + the log path | Auditing what actually moved |
azcopy jobs resume <id> |
Continues an interrupted job | After any failure/interruption |
azcopy jobs clean |
Deletes old plan/log files | Housekeeping the profile dir |
azcopy jobs show <id> --with-status=Failed |
Lists just the failed transfers | Pinpointing which files failed |
The end-of-run summary (Number of Transfers Completed / Failed / Skipped) is the single most important thing to read: Completed means it transferred (and was verified if MD5 is on); Skipped means an overwrite rule left it in place (expected with --overwrite=false); Failed means an error on that file — resume the job, and if it persists, read the log. A “finished” run with non-zero Failed is not a successful transfer.
Architecture at a glance
Picture AzCopy as a conductor between a source and a destination, not a pipe the data flows through. One command makes it enumerate the work (walking the folder or listing the container), plan the job by writing that file set to a local plan file, then open many parallel connections and stream files (large blobs split into concurrent chunks), updating each file’s status as it completes — and for Azure→Azure the bytes move server-side, so your machine only orchestrates. Because that “what’s done” record lives locally and updates per file, any interruption leaves a precise checkpoint and azcopy jobs resume re-issues only the unfinished transfers. The whole mental model is three separable things — what to move (the plan), permission to move it (SAS or Entra ID), and the engine that moves it (concurrency + chunking) — and almost every problem maps to one: a 403 is permission, a slow run is the engine, a half-finished job is the plan.
Real-world scenario
Meridian Analytics, a 40-person data consultancy in Pune, had to load a new client’s dataset into Azure before building a reporting pipeline: roughly 2.1 TB across 380,000 files — large Parquet files (200 MB–2 GB) plus a long tail of tiny JSON sidecars — on a Windows file server in the client’s office, due Monday and arriving Friday afternoon. The junior engineer’s first attempt, the portal’s Storage browser, froze the tab around 9 GB. The second, a PowerShell loop calling az storage blob upload per file, ran single-threaded at ~25 MB/s and died at file 41,000 when the office Wi-Fi dropped — with no way to tell which files had made it.
They switched to AzCopy. Because it ran from the client’s (non-Azure-joined) server, they used a container-scoped SAS with racwdl, valid 24 hours. It was an initial load, so copy --recursive — with --put-md5 for a verifiable load and --block-blob-tier=Cool, since the data would be read in batches, not constantly. The first run hit 413 MB/s sustained, AzCopy’s automatic concurrency saturating the office’s 5 Gbps link. Two hours in the Wi-Fi blipped again and the job stopped at ~1.4 TB — a non-event: azcopy jobs resume <id> (reusing the still-valid SAS via --destination-sas) picked up the remaining ~700 GB overnight. The summary read 380,000 Completed, 0 Failed, 0 Skipped, confirmed by azcopy jobs show — the verifiable “all of it arrived” the client wanted.
The follow-on was a recurring daily delta as the client appended files nightly. For that they switched to azcopy sync (no --delete-destination, since old files must stay), wrapped it in a scheduled task, and added --include-pattern "*.parquet;*.json" to skip temp files — each nightly run now moves only the day’s new data in a couple of minutes. The runbook lesson: “For any load over a few GB, reach for AzCopy first — copy for the initial dump, sync for the deltas, and never trust a transfer whose summary you didn’t read.”
Advantages and disadvantages
AzCopy is the right default for bulk transfer, but it is a focused tool with real boundaries. Weigh it honestly:
| Advantages | Disadvantages |
|---|---|
| Very fast — automatic parallelism saturates links a single-threaded copy can’t | CLI only; no GUI (use Storage Explorer if you need one) |
| Resumable — interrupted jobs continue, not restart | You must read the summary; a “finished” run can have failures |
| Single self-contained binary — drops into pipelines and cron | Manual upgrades (direct download has no auto-updater) |
sync moves only diffs — cheap, fast recurring refreshes |
sync --delete-destination can wipe files if misused |
| Works local↔Azure and Azure↔Azure (server-side) | Blob/Files only — not a general file-sync or backup product |
| Flexible auth (SAS or Entra ID/managed identity) | Auth is the #1 day-one stumble (403, list permission, mgmt-vs-data role) |
| Free, supported, and the documented Microsoft tool | No scheduling/transform logic — that’s Data Factory’s job |
The advantages matter most for migrations, recurring batch loads, anything in GB/TB, and pipeline automation. The disadvantages bite when you want a visual browse-and-drag tool (use Storage Explorer) or scheduling and transforms (that is Azure Data Factory) — and AzCopy is not a backup product: it moves bytes, it does not retain point-in-time versions.
Hands-on lab
This is the centrepiece. You will create a storage account, upload a folder, sync it, download it back, kill a transfer and resume it, then tear it all down — doing the upload three ways: the portal Storage browser, the az CLI (which wraps AzCopy), and provisioning the target with Bicep. It is cheap: a small LRS Standard account and a few MB of test data cost a few paise; you delete everything at the end. Run the CLI parts in Cloud Shell (Bash) (AzCopy is pre-installed) or any terminal with az and azcopy.
Part 1 — Provision the target (CLI)
Step 1 — Variables and resource group.
RG=rg-azcopy-lab
LOC=centralindia
ACCOUNT=stazcopylab$RANDOM # must be globally unique, 3–24 lowercase alphanumerics
CONTAINER=uploads
az group create -n $RG -l $LOC -o table
Expected: a table row showing the group Succeeded.
Step 2 — Create a Standard LRS storage account.
az storage account create -n $ACCOUNT -g $RG -l $LOC \
--sku Standard_LRS --kind StorageV2 -o table
Expected: a row with provisioningState = Succeeded and kind = StorageV2. (LRS is the cheapest redundancy — fine for a lab.)
Step 3 — Create the destination container.
az storage container create -n $CONTAINER \
--account-name $ACCOUNT --auth-mode login -o table
Expected: "created": true. Using --auth-mode login means your Entra ID identity authorises the call (no account key needed).
Step 4 — Make some test data locally.
mkdir -p /tmp/azcopy-src/sub
head -c 5M </dev/urandom > /tmp/azcopy-src/big.bin
echo "hello azcopy" > /tmp/azcopy-src/notes.txt
head -c 1M </dev/urandom > /tmp/azcopy-src/sub/nested.bin
ls -R /tmp/azcopy-src
Expected: a 5 MB big.bin, a notes.txt, and a nested 1 MB file in sub/ — enough to exercise --recursive.
Part 2 — Authenticate and upload with AzCopy (CLI)
Step 5 — Grant your identity data-plane access, then log in. Management-plane Owner does not grant blob data access; you need a data role.
ME=$(az ad signed-in-user show --query id -o tsv)
SCOPE=$(az storage account show -n $ACCOUNT -g $RG --query id -o tsv)
az role assignment create --assignee "$ME" \
--role "Storage Blob Data Contributor" --scope "$SCOPE" -o table
azcopy login # follow the device-code prompt (skip if already logged in / in Cloud Shell)
Expected: a role-assignment row, then INFO: Login succeeded. (Role propagation can take a minute — if the next step 403s, wait and retry.)
Step 6 — Upload the folder (Entra ID auth, recursive).
azcopy copy '/tmp/azcopy-src' \
"https://$ACCOUNT.blob.core.windows.net/$CONTAINER" --recursive
Expected: a Job <guid> has started line, a progress readout, and a summary ending with Number of Transfers Completed: 3 and Failed: 0. Read that summary — it is your proof of success.
Step 7 — Verify the blobs landed.
az storage blob list --account-name $ACCOUNT -c $CONTAINER \
--auth-mode login --query "[].name" -o tsv
Expected: azcopy-src/big.bin, azcopy-src/notes.txt, azcopy-src/sub/nested.bin — the folder name became a virtual path prefix.
Step 8 — Sync (change one file, transfer only the diff).
echo "updated note" > /tmp/azcopy-src/notes.txt # change one file
azcopy sync '/tmp/azcopy-src' \
"https://$ACCOUNT.blob.core.windows.net/$CONTAINER/azcopy-src"
Expected: the summary shows 1 file transferred (only notes.txt changed) and the rest skipped — proof that sync moves only diffs, unlike copy which would re-send all three.
Step 9 — Dry-run a mirror before deleting anything.
rm /tmp/azcopy-src/sub/nested.bin # remove a file locally
azcopy sync '/tmp/azcopy-src' \
"https://$ACCOUNT.blob.core.windows.net/$CONTAINER/azcopy-src" \
--delete-destination=true --dry-run
Expected: the dry-run prints DRYRUN: Delete ...nested.bin — the deletion it would perform, without doing it. Always dry-run before a real --delete-destination.
Part 3 — Resume an interrupted job
Step 10 — Make a bigger file and start a transfer you can interrupt.
head -c 200M </dev/urandom > /tmp/azcopy-src/large.bin
azcopy copy '/tmp/azcopy-src/large.bin' \
"https://$ACCOUNT.blob.core.windows.net/$CONTAINER/azcopy-src/large.bin"
# While it's running, press Ctrl-C to interrupt it partway.
Expected: progress starts; on Ctrl-C AzCopy stops and the job is left incomplete. (On a fast link 200 MB may finish quickly — use 500M if you need more time to interrupt.)
Step 11 — Find the job and resume it.
azcopy jobs list # note the most recent job's ID
azcopy jobs resume <paste-the-job-id> # continues from where it stopped
Expected: resume drives the unfinished transfer to Completed — it did not restart the bytes already sent. This is the whole reliability story in one command.
Step 12 — Audit the job.
azcopy jobs show <paste-the-job-id>
Expected: counts of Completed / Failed / Skipped and the path to the job log — your audit trail for “what actually moved.”
Part 4 — Download back
Step 13 — Download the container to a new local folder.
mkdir -p /tmp/azcopy-dl
azcopy copy \
"https://$ACCOUNT.blob.core.windows.net/$CONTAINER/azcopy-src" \
/tmp/azcopy-dl --recursive
ls -R /tmp/azcopy-dl
Expected: the blobs reappear as local files under /tmp/azcopy-dl/azcopy-src/... — download is just copy with source and destination swapped. (To exercise the SAS auth path instead, generate a SAS as in the Authentication section and append ?$SAS to the URL — the SAS is the authorisation, no azcopy login needed.)
Part 5 — The same upload via the az CLI wrapper
Step 14 — az storage copy (this literally invokes AzCopy under the hood). If you prefer staying inside the az CLI, it exposes AzCopy directly:
az storage copy -s '/tmp/azcopy-src' \
-d "https://$ACCOUNT.blob.core.windows.net/$CONTAINER/via-az" \
--recursive
Expected: the same AzCopy progress and summary — it is a thin wrapper, so every AzCopy concept (recursive, jobs, resume) applies. Verify it landed:
az storage blob list --account-name $ACCOUNT -c $CONTAINER \
--auth-mode login --prefix "via-az" --query "[].name" -o tsv
Part 6 — The same upload via the portal Storage browser
You do not always have a terminal. The portal’s Storage browser uploads through AzCopy-equivalent logic in the browser:
- In the Azure portal, open your storage account
stazcopylab.... - In the left menu choose Storage browser → Blob containers → uploads.
- Click Upload, then Browse for files (or drag a folder), pick a local file, expand Advanced, and note you can set the Blob type and Access tier here.
- Click Upload and watch the per-file progress in the notification panel.
- Refresh the container list — the file appears alongside the blobs AzCopy uploaded.
The portal path is fine for a handful of files but has no resume and no scripting — exactly why the AzCopy CLI (Steps 6–13) or its az storage copy wrapper (Step 14) is the answer for the 2 TB case above.
Part 7 — Provision the target with Bicep
AzCopy itself isn’t deployed by Bicep (it’s a client tool), but the target it writes to — the account and container — should be reproducible IaC:
// main.bicep — storage account + a blob container as the AzCopy target
@description('Globally-unique storage account name (3-24 lowercase alphanumerics)')
param accountName string
param location string = resourceGroup().location
resource sa 'Microsoft.Storage/storageAccounts@2023-05-01' = {
name: accountName
location: location
sku: { name: 'Standard_LRS' }
kind: 'StorageV2'
properties: {
minimumTlsVersion: 'TLS1_2'
allowBlobPublicAccess: false // private by default; AzCopy uses SAS/Entra ID
supportsHttpsTrafficOnly: true
}
}
resource blobSvc 'Microsoft.Storage/storageAccounts/blobServices@2023-05-01' = {
parent: sa
name: 'default'
}
resource container 'Microsoft.Storage/storageAccounts/blobServices/containers@2023-05-01' = {
parent: blobSvc
name: 'uploads'
properties: { publicAccess: 'None' }
}
output blobEndpoint string = sa.properties.primaryEndpoints.blob
Deploy it, then point AzCopy at the blobEndpoint output:
az deployment group create -g $RG \
--template-file main.bicep \
--parameters accountName=stazcopybicep$RANDOM -o table
Expected: provisioningState = Succeeded and a blobEndpoint output URL you can hand straight to azcopy copy.
Part 8 — Teardown
Step 15 — Delete everything (stops all charges).
az group delete -n $RG --yes --no-wait
azcopy jobs clean # optional: clear local plan/log files for the lab jobs
Expected: the resource group deletes asynchronously, removing the account, containers and all blobs. azcopy jobs clean tidies the local .azcopy profile dir.
Cost note. A few MB in Standard LRS plus a handful of transactions is well under ₹5 for the whole lab; inbound transfer to Azure is free. Deleting the resource group is what actually stops the (tiny) storage charge.
Common mistakes & troubleshooting
The day-one failures are predictable. Scan the table, then read the detail for the ones that bite hardest. Authentication is the overwhelming first stumble.
| # | Symptom | Root cause | Confirm | Fix |
|---|---|---|---|---|
| 1 | 403 AuthenticationFailed / AuthorizationPermissionMismatch |
Wrong/missing auth, or a management-plane role instead of data-plane | Re-run with --log-level=DEBUG; check az role assignment list |
Use a data role (Storage Blob Data Contributor) or a valid SAS |
| 2 | 403 ... Signature did not match / expired |
SAS expired, clock skew, or wrong permissions on the SAS | Check the SAS se= (expiry) and sp= (permissions) |
Regenerate the SAS with a future expiry and racwdl |
| 3 | sync fails to enumerate the destination |
SAS lacks l (list) permission |
Inspect SAS sp= letters |
Regenerate with racwdl (include list) |
| 4 | “0 files transferred / no files matched” | Missing --recursive, or an --include-pattern that matched nothing |
Add --dry-run; check the pattern |
Add --recursive; fix the glob quoting |
| 5 | Some files silently Skipped | --overwrite=false (or prompt) left existing files in place |
Read the summary Skipped count |
Set --overwrite=true if you meant to replace |
| 6 | A sync --delete-destination deleted wanted files |
Mirror semantics removed dest files not in source | Check the job log’s Delete lines | Use --dry-run first; or --delete-destination=false |
| 7 | Transfer crawls (e.g. 20 MB/s on a fast link) | Stale AzCopy version, tiny-file bottleneck, or capped concurrency | azcopy --version; check AZCOPY_CONCURRENCY_VALUE / --cap-mbps |
Upgrade AzCopy; raise concurrency; don’t over-cap |
| 8 | 503 ServerBusy / throttling mid-transfer |
Hitting the account’s request/throughput limits | Job log shows retried 503s |
AzCopy auto-retries; if persistent, lower concurrency or split the load |
| 9 | Resume says the job can’t be found | Wrong job ID, or plan files were cleaned/on another machine | azcopy jobs list on the same machine |
Use the correct ID; resume on the machine that holds the plan |
| 10 | Cannot find the path / source error |
Bad local path, or unquoted path with spaces | Echo the path; check quoting | Quote paths; verify the source exists |
| 11 | Auth fails on resume after it worked at first | The original SAS expired between run and resume | Check the SAS expiry | Pass a fresh --source-sas/--destination-sas on resume |
| 12 | Blob landed in the wrong tier | Account default tier, not what you wanted | List blob accessTier |
Use --block-blob-tier=Cool (etc.) on upload |
The two that bite hardest. The 403 (#1–2) is almost always authentication. With Entra ID, the trap is assigning Contributor/Owner (management plane) and assuming it grants data access — it does not; you need a data-plane role such as Storage Blob Data Contributor, which can take a minute to propagate. With SAS, it is an expired token or one missing a permission letter (sync specifically needs l to list). Confirm with --log-level=DEBUG and az role assignment list; the full decision tree is in Troubleshooting Azure Storage 403s. The --delete-destination wipe (#6) comes from sync being a mirror: anything at the destination not present in the source is deleted, so pointing it at a container holding more than your source folder removes the extras. The fix is muscle memory — --dry-run first, read the Delete lines, then run for real.
Best practices
- Always read the end-of-run summary.
Completed / Failed / Skippedis the verdict: non-zero Failed means it’s not done (resume it); non-zero Skipped is fine only if you intended--overwrite=false. copyfor initial loads,syncfor recurring refreshes — re-sending everything nightly withcopyis wasteful;syncmoves only diffs.--dry-runbefore anysync --delete-destinationso a mirror never deletes files unexpectedly.- Prefer Entra ID (managed identity) for automation; SAS for shareable, expiring grants — least privilege either way (
wclfor upload-only, Reader for download-only). - Keep AzCopy current (old versions are slower and miss flags) and run transfers close to the data — a VM/Cloud Shell in the account’s region beats a distant laptop, and inbound is free.
- Land data in the right tier at upload with
--block-blob-tier, use--put-md5/--check-md5when integrity must be verifiable, and filter with--include-patternso temp files don’t ride along. - Rely on
azcopy jobs list/resume— never re-run a half-finished transfer from scratch — and don’t over-cap--cap-mbps(too low turns a 2-hour job into a 2-day one).
Security notes
- Treat SAS tokens like passwords. A SAS is a bearer credential — anyone who sees it has its access until expiry. Keep SAS strings out of logs and shell history, pass them via environment variables, and use short expiries.
- Scope SAS tightly to the minimum permissions (
w+cfor upload-only) and narrowest scope (one container, not the account) — a download SAS should not carryw/d. - Prefer managed identity for unattended jobs (
azcopy login --identity) so there is no secret to leak, and use data-plane roles at least privilege (Storage Blob Data Reader for downloads, Contributor only where writes are needed — never management-plane Owner just to “make it work”). - Enforce HTTPS and TLS 1.2 (
supportsHttpsTrafficOnly: true,minimumTlsVersion: TLS1_2, both in the Bicep above); AzCopy uses HTTPS, so data is encrypted in transit. - Mind firewalls and private endpoints. If the account is locked to a VNet or private endpoint, AzCopy must run from inside that network or be allow-listed — otherwise you get
403/connection failures that look like auth bugs but are network policy. - Prefer user-delegation SAS (
--as-user, Entra-backed) over account-key SAS; rotating the account key is also an emergency revoke for all key-signed SAS at once.
Cost & sizing
The tool itself is free; what you pay for is storage and transactions around the transfer.
- AzCopy is free, and inbound transfer to Azure is free — uploading 2 TB into a region costs nothing in bandwidth. Egress (download out, or cross-region copy) is billed per GB after a small free allowance — that is where data-transfer cost appears.
- Transactions cost a little. Each blob written/read is a billed operation; millions of tiny files generate millions of transactions — small but non-zero, and a reason to avoid pointless re-copies (use
sync). - Storage tier dominates the ongoing bill, not the transfer. Landing data in Cool/Cold/Archive when it won’t be read often (via
--block-blob-tier) is the real cost lever; see Azure Blob Access Tiers.
For the Meridian case (2 TB, Cool tier, LRS, in-region): the tool and inbound transfer are ₹0; the 2 TB at Cool runs roughly ₹1,500–2,500/month (cheaper than Hot for infrequent reads); the 380k writes are a few rupees one-off; egress is billed only if you download data back out — avoid it by processing in-region.
Sizing the transfer itself is about the link and file mix, not a SKU: throughput is bounded by your bandwidth, source disk speed, and — for huge counts of tiny files — request latency. There is no AzCopy “tier”; for a TB-scale migration, run it from a VM in the target region on a fast link, let automatic concurrency work, and cap bandwidth only if you must share the pipe.
Interview & exam questions
1. What is AzCopy and when would you use it over the portal? A single-binary command-line tool for bulk transfer to/from/between Azure Blob and Files endpoints. Choose it over the portal whenever a transfer is large, scripted, recurring, or needs resume — the portal upload is single-tab and has none of that.
2. Difference between azcopy copy and azcopy sync? copy transfers source → destination and overwrites by default (one-directional) — right for initial loads. sync compares both sides, transfers only differences, and can delete destination files missing from the source — right for recurring refreshes; the wrong verb either re-sends everything or risks deleting files.
3. How does AzCopy resume an interrupted transfer? Every transfer is a job with a local plan file recording each file’s status; on interruption the completed files stay done and azcopy jobs resume <id> re-issues only the unfinished ones. Resume reuses the plan but needs valid credentials again (a fresh SAS if the old one expired).
4. Two ways to authenticate AzCopy, and when to use each? A SAS token on the URL (scoped, time-limited — for pipelines/sharing) and azcopy login via Entra ID (RBAC, no secret on the URL — for interactive use and managed identities). SAS is the grant; Entra ID requires a data-plane role.
5. A sync upload fails to enumerate the destination with a 403. Most likely cause? The SAS lacks the l (list) permission — sync must list the destination to diff it, so a SAS with rwc but no l works for copy but fails sync. Regenerate with racwdl.
6. You assigned the identity Contributor but AzCopy still gets 403. Why? Contributor/Owner are management-plane roles and do not grant data-plane blob access. You need a data role such as Storage Blob Data Contributor (or Reader for download-only), and it may take a minute to propagate.
7. How do you avoid accidentally deleting files with sync, and how do you confirm a transfer truly finished? Run --dry-run before any --delete-destination — it prints the files it would send and delete without acting. To confirm completion, read the end-of-run summary (Completed / Failed / Skipped): non-zero Failed means it is not done — resume to clear it.
8. How do you land data in the Cool tier on upload, and does AzCopy work account-to-account? Pass --block-blob-tier=Cool (or Cold/Archive) so blobs skip the default tier. Azure-to-Azure copies run server-side (same-region avoids egress); az storage copy invokes AzCopy under the hood with the same recursive/jobs/resume semantics.
These map mainly to AZ-104 (Administrator) and touch AZ-204 (Developer) and AZ-500 (Security):
| Question theme | Primary cert | Objective area |
|---|---|---|
| AzCopy copy/sync/resume, transfer tooling | AZ-104 | Configure and manage Azure storage |
| SAS generation and scoping | AZ-104 / AZ-204 | Secure storage; access data |
| Entra ID data-plane roles vs mgmt plane | AZ-500 | Manage access to storage |
| Tiers on upload, cost of transfer | AZ-104 | Optimize storage cost |
| Pipelines pushing artifacts (managed identity) | AZ-204 | Develop solutions that use storage |
Quick check
- You need to load 500 GB into a container once, from a laptop, over a flaky connection. Which verb do you use, and which feature makes the flakiness a non-issue?
- True or false: assigning the Contributor role on a storage account lets AzCopy upload blobs with Entra ID auth.
- Your nightly
azcopy syncto a container fails with a403while listing the destination, even thoughcopyworked yesterday. What’s the single most likely cause? - What does
--dry-rundo, and before which dangerous flag should you always use it? - A transfer’s summary shows
Completed: 9,980, Failed: 20. Is the transfer done, and what’s your next command?
Answers
azcopy copy --recursivefor the one-off load; resume handles the flakiness — after a drop,azcopy jobs resume <id>continues from where it stopped instead of restarting.- False. Contributor is a management-plane role; you need a data-plane role such as Storage Blob Data Contributor.
- The SAS is missing the
l(list) permission —syncmust enumerate the destination to diff it. Regenerate withracwdl. - It prints exactly what would transfer (and delete) without acting — always use it before
--delete-destination=true. - No — twenty files failed. Run
azcopy jobs resume <id>; if they keep failing, read the log viaazcopy jobs show <id>.
Glossary
- AzCopy — a single self-contained command-line binary from Microsoft for bulk data transfer to, from, and between Azure Blob and Files endpoints.
copy(verb) — transfers a source to a destination, overwriting by default; one-directional. For initial loads and one-off pushes.sync(verb) — makes the destination match the source, transferring only differences and (optionally) deleting destination files absent from the source. For recurring refreshes and mirrors.- Job — one AzCopy transfer invocation, identified by a unique GUID, recorded in a local plan file.
- Plan file — AzCopy’s local record (in
~/.azcopy) of every file in a job and its status; the basis for resume. - Resume — continuing an interrupted job from its last checkpoint via
azcopy jobs resume <id>, transferring only the files that hadn’t completed. - SAS (Shared Access Signature) — a signed, scoped, time-limited query string appended to a resource URL that grants specific permissions without an Azure identity.
- Entra ID login —
azcopy login; authenticates a user or managed identity via Microsoft Entra ID, authorising transfers through RBAC roles instead of a URL secret. - Data-plane role — a storage RBAC role (e.g. Storage Blob Data Contributor/Reader) that grants access to the data in blobs; distinct from management-plane Owner/Contributor.
- Concurrency — the number of parallel connections AzCopy uses (default
AUTO; set viaAZCOPY_CONCURRENCY_VALUE); the source of AzCopy’s speed. --dry-run— shows what a command would transfer or delete without performing it; the safety check before async --delete-destinationmirror.- Storage browser — the Azure portal’s in-browser blob explorer with an upload UI; convenient for a few files but with no resume or scripting.
Next steps
You can now move data into and out of Azure storage reliably. Build outward:
- Next: Azure Storage Account Fundamentals — the account, containers, blob types and endpoints AzCopy writes to.
- Related: Azure Blob Access Tiers: Hot, Cool, Cold & Archive — choose the tier you land data in with
--block-blob-tier. - Related: Azure Storage Redundancy: LRS, ZRS, GRS & RA-GRS — the durability choice for the destination account.
- Related: Troubleshooting Azure Storage 403s: Firewall, Private Endpoint, RBAC & SAS — the playbook for the auth failures AzCopy surfaces.
- Related: Azure Data Factory: Your First Copy Pipeline (Blob to SQL) — when you need scheduling, transforms and managed pipelines instead of a terminal command.