Azure Lesson 28 of 137

Azure Files and Azure NetApp Files: Identity-Based SMB, AD/Kerberos Auth, Snapshots, and Hybrid Sync

Every “lift the file server to Azure” project stalls at the same place: somebody opens the share, gets prompted for credentials that do not match their domain account, and the migration is suddenly “blocked on storage.” The fix is almost never more storage – it is wiring identity correctly so that the NTFS ACLs the business has maintained for fifteen years keep working, and that the wire protocol stays Kerberos rather than falling back to a storage account key everyone can extract. Azure gives you two managed SMB platforms for this: Azure Files (a feature of the storage account) and Azure NetApp Files (a bare-metal NetApp service fronted by an Azure resource provider). They overlap on the marketing slide and diverge sharply in operation.

This is how to choose between them, stand up identity-based SMB without leaking a key, lock the data plane to private endpoints, and protect the data with snapshots and replication that survive a region loss. We treat the whole thing as one two-gate access path (share-level RBAC for can you mount, NTFS ACLs for what you can do) running over a Kerberos handshake that only behaves when DNS resolves the account to a private IP. Get those three – identity source, two gates, private DNS – right, and the rest is sizing.

By the end you will stop guessing which platform to pick, you will know exactly which klist ticket proves Kerberos won, you will be able to read the directoryServiceOptions field that tells you whether you achieved identity-based access at all, and you will have a snapshot-plus-replication posture that distinguishes “oops, I deleted a file” from “ransomware encrypted the share.” Because this is a reference you will return to mid-migration, the platform comparison, the identity matrix, the RBAC roles, the error strings and the sizing levers are all laid out as scannable tables – read the prose once, then keep the tables open during the cutover.

What problem this solves

A domain file server is fifteen years of accreted NTFS ACLs, mapped drives in logon scripts, DFS namespaces, and muscle memory. “Move it to the cloud” sounds like a storage task and is actually an identity task: unless the storage account can mint a Kerberos ticket your domain users trust, every mount prompts for credentials, falls back to NTLM (which Azure Files rejects for AD identities), or – worst – gets wired up with the storage account key, which is a 64-byte root password to the entire account that anyone with the connection string can extract and which bypasses every ACL you ever set.

What breaks without getting this right: FSLogix profile containers fail to attach during the morning login storm; a “temporary” key-based mount in a logon script becomes permanent and un-auditable; on-prem clients resolve the account to a public IP and either get blocked at port 445 or quietly egress file traffic over the internet; and the team discovers during an incident that their only “backup” was a snapshot living in the same account the attacker just encrypted.

Who hits this: anyone migrating a Windows file server, anyone running Azure Virtual Desktop with FSLogix profiles, SAP/HPC/EDA teams who need sub-millisecond NFS or SMB, and every hybrid shop that has both on-prem AD DS and Entra-joined endpoints and has to decide which identity source the storage account joins. The fix is rarely “buy a bigger tier” – it is “wire identity, DNS, and the two access gates so the protocol does what you think it does.”

To frame the field before the deep dive, here is every decision this article forces, the question behind it, and the section that settles it:

Decision The question it forces Where it is settled
Azure Files vs ANF Does the SLO mention milliseconds or microseconds? Platform selection
Identity source On-prem AD DS, Entra Kerberos, or Entra Domain Services? Identity-based access
Access model Who can mount (RBAC) vs what they can do (NTFS)? The two-gate model
Network exposure Public IP, service endpoint, or private endpoint? Private endpoints & DNS
Data protection depth Oops-recovery, ransomware-recovery, or region-loss? Snapshots, backup, replication
Branch caching Sync the whole dataset or tier the cold tail? Azure File Sync
Throughput model Provision capacity for IOPS, or decouple them? Performance tuning

Learning objectives

By the end of this article you can:

Prerequisites & where this fits

You should be comfortable with the storage-account fundamentals – redundancy (LRS/ZRS/GRS), the resource model, and SAS/keys – from the Azure Storage Accounts Deep Dive and Azure Storage Account Fundamentals. You should understand basic Active Directory (computer objects, SPNs, SIDs, OUs) and that Entra Connect syncs on-prem AD to Entra ID – the Entra Connect Sync deep dive is the upstream of every “synced identity” claim here. Private DNS and private endpoints from Private Endpoints & Private DNS at scale are assumed; the Private DNS Resolver hybrid forwarding article is how on-prem clients resolve the private zone.

This sits in the storage + identity seam. It assumes the identity fundamentals from Entra ID Fundamentals and pairs tightly with Azure Virtual Desktop at 5,000 users with FSLogix, because FSLogix profile storage is the single most common reason teams care about identity-based SMB. When mounts fail with 403/access-denied, Troubleshooting Azure Storage: 403s, firewall, private endpoint, RBAC & SAS is the sibling playbook.

A quick map of who owns what during a file-server migration, so you escalate to the right person fast:

Layer What lives here Who usually owns it Failure classes it can cause
Identity source AD object, SPN, Kerberos key Identity / AD team Mount prompts, NTLM fallback, key-based mounts
DNS privatelink.file zone, forwarders Network team FQDN → public IP, 445 blocked, internet egress
Network Private endpoint, NSG, delegated subnet Network team 445 unreachable, ANF subnet collisions
Storage platform Files account / ANF volume, tiers Storage / platform team Throttling (429), throughput limits, pool floor cost
Access control Share RBAC + NTFS ACLs App + identity team “Can mount but can’t write”, over-broad access
Data protection Snapshots, soft delete, backup, CRR Backup / DR team No ransomware recovery, failover not rehearsed

Core concepts

Five mental models make every later decision obvious.

There are two managed SMB platforms, and they are different resources. Azure Files is a feature of a storage account (Microsoft.Storage/storageAccounts); you get a share inside the same account that holds blobs and queues. Azure NetApp Files is a separate, bare-metal NetApp service (Microsoft.NetApp/netAppAccounts) with its own hierarchy – account → capacity pool → volume – injected into a delegated subnet. They both speak SMB 3.x and both do snapshots and AD integration, but they bill, scale, and operate differently.

Identity-based access means the storage account has its own AD object. A computer (or service-logon) object representing the storage account is created in your chosen identity source. That object holds a Kerberos key. When a client mounts \\<account>.file.core.windows.net\<share>, it asks a domain controller for a service ticket for the SPN cifs/<account>.file.core.windows.net, the account decrypts that ticket with its Kerberos key, Azure maps the user’s SID, and only then are NTFS ACLs evaluated. No key, no password, no prompt – Kerberos single sign-on against the signed-in domain user.

Access is a two-gate model and confusing the gates is the #1 ticket. Gate one is share-level RBAC: Azure role assignments decide who can mount the share at all. Gate two is directory/file-level NTFS: standard Windows ACLs decide what you can do once mounted. A user can have the RBAC role to mount and still be denied a write by the NTFS ACL – or have generous NTFS but no RBAC and never get in the door. Both gates must say yes.

DNS decides whether Kerberos and the private data plane work at all. By default *.file.core.windows.net resolves to a public IP. For Kerberos to behave and for SMB (TCP 445) to stay off the internet, you front the account with a Private Endpoint and wire the privatelink.file.core.windows.net private DNS zone so the FQDN resolves to the endpoint’s private IP. On-prem clients must be able to resolve that private zone (via forwarders to a Private Resolver) or they will resolve the public IP and mounts fail or egress over the internet.

Snapshots, backup, and replication protect against different things. Share snapshots and soft delete live in the same account as the data – they cover accidental deletion (“oops”) but an attacker with account rights can purge them. Vaulted backup stores an immutable copy off the account – that is your ransomware defense. Cross-region replication (ANF) or GRS mirrors the data to another region – that is your region-loss defense. They are defense in depth, not substitutes.

The vocabulary in one table

Pin down every moving part before the deep sections; the glossary repeats these for lookup.

Concept One-line definition Where it lives Why it matters
Azure Files SMB/NFS share inside a storage account Microsoft.Storage General-purpose, one resource to manage
Azure NetApp Files (ANF) Bare-metal NetApp service Microsoft.NetApp Sub-ms latency; SAP/HPC/EDA
Capacity pool ANF container that sets service level Under a NetApp account Sets throughput; 1 TiB minimum
Volume The actual ANF share Carved from a pool Lives in a delegated subnet
Identity source Where SMB identities resolve AD DS / Entra / AAD DS Mints the Kerberos ticket
Kerberos key Secret for the account’s AD object The AD object Decrypts the cifs service ticket
Share-level RBAC Who can mount the share Azure role assignment Gate one
NTFS ACL What you can do in the share The file/folder Gate two
Private endpoint Private IP for the account A subnet NIC Forces SMB off the internet
privatelink.file The private DNS zone Private DNS Resolves FQDN → private IP
Share snapshot Read-only point-in-time copy In the account Previous Versions; oops recovery
Cross-region replication ANF volume mirror Paired region Region-loss recovery
Cloud tiering Cold files become Azure pointers Azure File Sync agent Branch cache of a big dataset

Azure Files vs Azure NetApp Files: tiers, performance, and cost

Both speak SMB 3.x and NFS, both do snapshots, both integrate with AD. The decision usually comes down to latency floor, throughput ceiling, and how much operational surface you want to own.

Dimension Azure Files Azure NetApp Files (ANF)
Resource model Feature of Microsoft.Storage/storageAccounts Microsoft.NetApp/netAppAccounts → capacity pool → volume
SMB tiers Standard (HDD, GPv2) and Premium (SSD, FileStorage account) Standard / Premium / Ultra service levels (set on the pool)
Latency Premium low-single-digit ms; Standard higher and burstier Sub-millisecond typical on Premium/Ultra
Throughput scaling Premium scales with provisioned size (or v2 provisioned IOPS/throughput) Throughput follows pool service level and volume quota
Min footprint A 100 GiB Premium share 1 TiB capacity pool; 2 TiB volume floor on manual pools
Protocols SMB, NFSv4.1, REST SMB, NFSv3, NFSv4.1, dual-protocol
Data protection Share snapshots, soft delete, Backup vault (vaulted) Volume snapshots, snapshot policies, backup, cross-region replication
Network injection Standard endpoint or private endpoint Delegated subnet (Microsoft.NetApp/volumes)
AD integration Per storage account (one directory service) Per NetApp account (AD connection)
Encryption keys Platform-managed or CMK Platform-managed (CMK in preview/regions)
Largest single share/volume 100 TiB (large file shares) 100 TiB (large volumes)
Backup model Vaulted Backup vault ANF backup + snapshot policy
Best fit General-purpose shares, app config, FSLogix, File Sync hub SAP, HPC scratch, EDA/render, NFS databases

Rule of thumb I use on reviews: if the workload tolerates a few milliseconds and you want one resource to manage, Azure Files Premium. If the workload’s SLO mentions microseconds, or it is SAP/HPC/EDA, ANF – and budget for the 1 TiB+ pool floor whether you use it or not.

The same choice as a decision table – match the workload signal to the platform and the reason:

If you see… It’s probably… Do this
SLO in microseconds / SAP HANA / HPC scratch A latency-floor workload ANF (Premium/Ultra), accept the pool floor
General file shares, app config, departmental data A few-ms-tolerant workload Azure Files Premium (one resource)
FSLogix profiles with morning login storms IOPS-spiky, capacity-modest Azure Files Premium SSD v2, provision IOPS
40 TB on a branch server, 2 TB hot A caching problem, not a storage one Azure File Sync with cloud tiering
Dual-protocol (SMB + NFS) on the same data A mixed-client workload ANF dual-protocol (needs LDAP)
Cheapest bulk archive, rare access A cost-first, infrequent workload Azure Files Standard (Cool)
Need cross-region DR with tight RPO A region-loss requirement ANF cross-region replication (10-min)

Azure Files tiers, option by option

The Azure Files side alone has three billing/performance models, and picking the wrong one is the single biggest line item I see on file-storage bills.

Tier / model Media Billing basis Latency When to pick Gotcha
Standard (Transaction Optimized) HDD (GPv2) Used GiB + per-transaction Higher, burstier Cheap bulk, infrequent access Transaction costs add up on chatty apps
Standard (Hot) HDD (GPv2) Used GiB + lower transactions Higher General file shares Still HDD latency floor
Standard (Cool) HDD (GPv2) Lowest GiB, highest transactions Higher Archive-ish shares Transaction-heavy access is expensive
Premium v1 SSD (FileStorage) Provisioned GiB (IOPS scale with size) Low single-digit ms Latency-sensitive, predictable Over-provision capacity to buy IOPS = waste
Premium SSD v2 SSD (FileStorage) Provisioned GiB + IOPS + throughput, independently Low single-digit ms Bursty IOPS on a small footprint Newer; verify region availability

A subtle cost trap: Premium v1 bills on provisioned size, not used size, and throughput is a function of that provisioned size. The newer Premium SSD v2 model decouples capacity from IOPS and throughput so you provision each independently, which usually lowers spend on bursty shares like FSLogix.

The same models read as a capability grid against the features you actually pick on:

Capability Standard Premium v1 Premium SSD v2 ANF
Media HDD SSD SSD SSD (NetApp)
Latency floor High/bursty Low single-digit ms Low single-digit ms Sub-millisecond
IOPS decoupled from size n/a No Yes Per quota × level
Per-transaction charge Yes No No No
Identity-based SMB Yes Yes Yes Yes (AD)
Snapshots Yes Yes Yes Yes (255/vol)
Cross-region replication GRS (account) GRS (account) LRS/ZRS CRR (volume)
Min footprint 1 GiB 100 GiB 100 GiB 1 TiB pool

ANF service levels and the pool floor

ANF throughput is set by the service level on the capacity pool, not by the volume. On a manual-QoS pool the volume throughput is quota_TiB × service_level_MiBps.

Service level Throughput per TiB (manual QoS) Typical workloads Cost posture
Standard ~16 MiB/s per TiB General SMB/NFS, dev Lowest ANF tier
Premium ~64 MiB/s per TiB SAP data, busy NFS Mid
Ultra ~128 MiB/s per TiB HPC scratch, EDA, hot DB Highest
Pool/volume constraint Value Why it bites
Minimum capacity pool 1 TiB You pay for 1 TiB even at 200 GiB used
Minimum volume (manual pool) Effectively 2 TiB to get useful throughput Tiny volumes are throughput-starved
Subnet delegation Microsoft.NetApp/volumes The subnet cannot host other resources
QoS type Auto (per-volume) or Manual (carve throughput) Manual lets you over/under-provision per volume
Service-level change Online (no outage) You can move Premium → Ultra live

Limits and quotas you will actually hit

Real numbers, because “it’s slow” is usually “you hit a documented ceiling”:

Limit Azure Files Azure NetApp Files Why it bites
Max share / volume size 100 TiB (large file shares) 100 TiB (large volumes) Plan large-share enablement up front
Max IOPS (Premium) Scales with size (v1) / provisioned (v2) Per service level × quota The throttling ceiling
Max throughput Up to ~10+ GiB/s (v2, large) Per service level × quota Bandwidth-bound jobs
Snapshots per share / volume 200 255 per volume Oldest must be pruned beyond limit
Min provisioned (Premium) 100 GiB 1 TiB pool / 2 TiB useful volume The cost floor
Open handles per share ~10,000 (varies) High Handle leaks exhaust it
SMB Multichannel Supported (Premium) Supported Off by default in some cases
Soft-delete retention 1–365 days Snapshot-based Default off on old accounts

Identity-based access: on-prem AD DS vs Entra Kerberos vs Entra Domain Services

Azure Files supports three SMB identity sources. Pick exactly one per storage account.

Source Where identities live Best for NTFS ACLs resolve against On-prem AD required
On-prem AD DS Your existing AD, synced to Entra via Connect Domain-joined servers/clients, lift-and-shift Your AD SIDs Yes
Microsoft Entra Kerberos Entra ID (cloud-only) Entra/hybrid-joined endpoints, FSLogix in AVD Still AD DS SIDs (synced) No DC, but synced AD for ACLs
Entra Domain Services (AAD DS) A managed domain Managed DC, no on-prem AD to run The managed domain SIDs No (managed)

The mechanics are the same in spirit: a computer object representing the storage account is created in the identity source, the account holds a Kerberos key for that object, and clients get a Kerberos ticket for cifs/<account>.file.core.windows.net. The user’s NTFS-level access is then evaluated against the file/folder ACLs.

Joining the account to on-prem AD DS

For on-prem AD DS, the AzFilesHybrid PowerShell module does the domain join. Run it from a domain-joined machine that can reach a DC, signed in as someone who can create the AD object:

# Import the AzFilesHybrid module (from the AzureFilesHybrid GitHub release)
Import-Module .\AzFilesHybrid.psd1
Connect-AzAccount -Subscription "<sub-id>"

# Creates an AD object (computer or service-logon account) for the storage account
# and configures it to use AD DS Kerberos for SMB.
Join-AzStorageAccount `
  -ResourceGroupName "rg-files-prod" `
  -StorageAccountName "stfilesprod01" `
  -SamAccountName "stfilesprod01" `
  -DomainAccountType "ComputerAccount" `
  -OrganizationalUnitDistinguishedName "OU=AzureFiles,OU=Servers,DC=corp,DC=contoso,DC=com"

# Verify the account now advertises AD DS as its directory service
$acct = Get-AzStorageAccount -ResourceGroupName "rg-files-prod" -StorageAccountName "stfilesprod01"
$acct.AzureFilesIdentityBasedAuth.DirectoryServiceOptions   # expect: AD

The Join-AzStorageAccount parameters carry consequences worth enumerating:

Parameter Values Default Effect Gotcha
DomainAccountType ComputerAccount, ServiceLogonAccount ComputerAccount Object class for the account ServiceLogonAccount needs a password policy that won’t expire the key
OrganizationalUnitDistinguishedName Any writable OU Default Computers container Where the object lands Wrong OU → GPO/cleanup scripts may delete it
SamAccountName ≤ account name Storage account name The object’s sAMAccountName Long names get truncated; SPN must still match
EncryptionType RC4, AES256, both Both Kerberos enc on the object Disable RC4 for security; ensure clients support AES

Hard constraint worth internalizing for Entra Kerberos: it authenticates the user, but NTFS-level permissions are still enforced against AD DS SIDs. For pure cloud-only file servers without any on-prem AD, you configure share-level RBAC for access and rely on default file ACLs – you cannot set fine-grained per-user NTFS ACLs by cloud identity unless those identities are synced from AD DS. Plan FSLogix/AVD deployments accordingly.

Enabling Entra Kerberos

For Entra Kerberos (cloud identities, no on-prem AD object), enable it on the account and grant admin consent to the auto-created app registration:

az storage account update \
  --resource-group rg-files-prod \
  --name stfilesprod01 \
  --enable-files-aadkerb true
# Then in Entra ID → App registrations, grant admin consent to the
# "[Storage Account] <name>.file.core.windows.net" app (openid/profile/User.Read).

The three identity sources, compared on the operational surface you actually own:

Concern On-prem AD DS Entra Kerberos Entra Domain Services
DCs to run/patch Yours None Managed by Azure
Fine-grained NTFS by identity Yes Only via synced AD SIDs Yes (managed domain)
Works for Entra-joined endpoints Needs line-of-sight to DC Native Needs domain join to AAD DS
Setup module/tool AzFilesHybrid az ... --enable-files-aadkerb AAD DS deployment + join
Best fit Hybrid file servers AVD/FSLogix on cloud endpoints “Managed AD, no on-prem”

The two-gate access model: share-level RBAC, NTFS ACLs, and the Kerberos flow

Access in Azure Files is a two-gate model, and confusing the two is the most common support ticket I triage.

  1. Share-level (RBAC) decides who can mount the share at all. You assign Azure roles scoped to the file share.
  2. Directory/file-level (NTFS) decides what you can do once mounted. Standard Windows ACLs, set with icacls, enforced against AD SIDs.

Gate one: the share-level RBAC roles

There are three built-in SMB share roles, plus the account-key path you are explicitly avoiding:

Role Mount Read Write/Modify Modify NTFS ACLs When to use
Storage File Data SMB Share Reader Yes Yes No No Read-only consumers
Storage File Data SMB Share Contributor Yes Yes Yes No Standard users
Storage File Data SMB Share Elevated Contributor Yes Yes Yes Yes Admins setting ACLs
(Storage account key) Yes Yes Yes Yes (as superuser) Avoid – bypasses identity entirely

Assign share-level access to an AD group that is synced to Entra ID, scoped to the share, not the whole account:

# Scope the role to the specific file share, not the whole account
scope=$(az storage account show -g rg-files-prod -n stfilesprod01 --query id -o tsv)/fileServices/default/fileshares/projects

az role assignment create \
  --assignee "<entra-group-object-id>" \
  --role "Storage File Data SMB Share Contributor" \
  --scope "$scope"

Gate two: NTFS ACLs and mounting with Kerberos

Mount the share on a domain-joined client. Do not pass a storage account key – that defeats identity-based auth and is exactly what we are avoiding. With AD DS configured, the client transparently gets a Kerberos ticket:

# No key, no /user, no password -- Kerberos SSO against the signed-in domain user
net use Z: \\stfilesprod01.file.core.windows.net\projects

# Confirm you actually got Kerberos (not NTLM) and a ticket for the storage account
klist | Select-String "cifs/stfilesprod01.file.core.windows.net"

Set the actual NTFS ACLs once, from an Elevated Contributor session, then let the directory tree inherit. The icacls inheritance flags trip people up, so enumerate them:

icacls Z:\engineering /grant "CORP\eng-team:(OI)(CI)M"   # Modify, inherited to children
icacls Z:\engineering /remove "CORP\Everyone"
icacls token Meaning Use it for
(OI) Object Inherit – applies to files in the folder Most data folders
(CI) Container Inherit – applies to subfolders Most data folders
(IO) Inherit Only – not on this object, only children Templates that shouldn’t grant on the root
M Modify (read/write/delete) Standard user grant
RX Read & execute Read-only shares
F Full control Admins only – avoid broad use
/remove Strip an ACE Remove Everyone/Authenticated Users

The Kerberos flow, and the three usual failure culprits

The Kerberos flow underneath: the client requests a service ticket (TGS) for the SPN cifs/stfilesprod01.file.core.windows.net from a DC, the storage account decrypts it with the Kerberos key minted during Join-AzStorageAccount, Azure maps the user SID, and then the NTFS ACL is evaluated. If clients fall back to NTLM, the SMB mount fails by design – Azure Files does not accept NTLM for AD identities.

Symptom Root cause Confirm Fix
Mount prompts for credentials SPN missing on the AD object setspn -L stfilesprod01 Re-run Join-AzStorageAccount; add the cifs SPN
Mount fails, no ticket issued Client can’t reach a DC for the TGS nltest /dsgetdc:corp.contoso.com Fix routing/firewall to a DC; check site/subnet
Mount fell back to NTLM DNS resolves account to a public IP Resolve-DnsName ... shows public IP Wire private DNS to the PE private IP
Access denied after mount NTFS ACL doesn’t grant the user icacls Z:\path Grant the synced AD group at NTFS level
Can mount but write denied RBAC ok, NTFS denies Check both gates Grant Modify on the directory

Private endpoints, DNS, and eliminating public access

By default *.file.core.windows.net resolves to a public IP. For Kerberos to behave and for the data plane to stay off the internet, front the account with a Private Endpoint and shut public access.

# 1) Create the private endpoint targeting the 'file' sub-resource
az network private-endpoint create \
  --resource-group rg-files-prod \
  --name pe-stfilesprod01-file \
  --vnet-name vnet-hub --subnet snet-privatelink \
  --private-connection-resource-id "$(az storage account show -g rg-files-prod -n stfilesprod01 --query id -o tsv)" \
  --group-id file \
  --connection-name plsc-stfilesprod01-file

# 2) Wire it to the Private DNS zone so the FQDN resolves to the private IP
az network private-endpoint dns-zone-group create \
  --resource-group rg-files-prod \
  --endpoint-name pe-stfilesprod01-file \
  --name pdnszg-file \
  --private-dns-zone "privatelink.file.core.windows.net" \
  --zone-name file

# 3) Disable public network access entirely
az storage account update -g rg-files-prod -n stfilesprod01 --public-network-access Disabled

The same in Bicep, which is how I keep the endpoint and zone-group from drifting:

resource pe 'Microsoft.Network/privateEndpoints@2023-09-01' = {
  name: 'pe-stfilesprod01-file'
  location: location
  properties: {
    subnet: { id: privateLinkSubnetId }
    privateLinkServiceConnections: [ {
      name: 'plsc-stfilesprod01-file'
      properties: {
        privateLinkServiceId: storageAccountId
        groupIds: [ 'file' ]   // the 'file' sub-resource, NOT 'blob'
      }
    } ]
  }
}

The network-exposure options, end to end

You have three ways to expose the file data plane, and only one is appropriate for identity-based SMB at rest:

Exposure model How traffic reaches it Kerberos behaves? When to use Limit / gotcha
Public endpoint (default) Public IP, port 445 Often blocked by ISPs on 445 Never for prod identity-based 445 frequently firewalled outbound
Service endpoint VNet-optimized route to public IP Yes, from the VNet VNet-only, simple Still a public IP; on-prem can’t use it
Private endpoint Private IP on a subnet NIC Yes, account-wide Production hybrid Needs private DNS wired everywhere

The difference between service and private endpoints matters enough that it has its own write-up: Private Endpoint vs Service Endpoint. For files specifically, the private endpoint is the only model that on-prem clients can use over ExpressRoute/VPN without going to the public internet.

The ports and protocols you must allow end to end – get one NSG/firewall rule wrong and the mount or the Kerberos handshake fails:

Port / protocol Direction Used for If blocked
TCP 445 (SMB) Client → file endpoint The SMB data plane itself No mount at all
TCP/UDP 88 (Kerberos) Client → DC TGS for the cifs SPN No ticket → NTLM fallback → fail
TCP/UDP 53 (DNS) Client → resolver Resolve privatelink.file FQDN → public IP
TCP 389 / 636 (LDAP/S) Client/ANF → DC SID/group lookups, ANF dual-protocol Group resolution fails
TCP 2049 (NFS) Client → volume NFSv3/4.1 data plane No NFS mount
TCP 443 (HTTPS) Mgmt/REST Control plane, REST data ops Portal/CLI ops fail (not SMB)

The DNS detail that bites everyone

The storage account FQDN is stfilesprod01.file.core.windows.net, but the private DNS zone is privatelink.file.core.windows.net. Azure’s public DNS returns a CNAME from the former to the latter; your private zone resolves privatelink.file.core.windows.net to the PE’s private IP.

Question Where it resolves What it returns Failure if wrong
stfilesprod01.file.core.windows.net Public DNS CNAME → privatelink.file...
privatelink.file.core.windows.net Your private DNS zone The PE private IP (e.g. 10.20.1.7) Resolves public → 445 blocked / NTLM
Same, from on-prem On-prem DNS forwarder Must chain to the private zone On-prem mounts fail or egress public

On-prem clients must be able to resolve that private zone – via DNS forwarders pointing at an Azure DNS Private Resolver inbound endpoint, or conditional forwarders to a DNS VM in the hub. The full pattern is in Private DNS Resolver hybrid conditional forwarding. If on-prem still resolves the account to a public IP, mounts fail or quietly egress over the internet.

Snapshots, soft delete, and layered data protection

Azure Files gives you three independent layers. Use all three for anything that matters.

Layer What it protects against Lives where Attacker with account rights can purge? Recovery surface
Share snapshot Accidental change/delete (“oops”) In the account Yes Previous Versions tab
Soft delete Accidental share deletion In the account Yes (after retention) Undelete within window
Vaulted backup Ransomware, malicious purge Off the account (Backup vault) No (immutable) Restore from vault

Share snapshots are read-only, incremental, point-in-time copies surfaced to users through the Previous Versions tab on Windows:

az storage share snapshot \
  --account-name stfilesprod01 \
  --name projects \
  --auth-mode login

Soft delete keeps deleted shares (and snapshots) recoverable for a retention window – your safety net against a fat-fingered az storage share delete:

az storage account file-service-properties update \
  --resource-group rg-files-prod \
  --account-name stfilesprod01 \
  --enable-delete-retention true \
  --delete-retention-days 14

Vaulted backup via a Backup vault adds scheduled snapshot management with offsite retention and, critically, an immutable copy that an attacker with storage-account rights cannot purge. Snapshots and soft delete live in the same account as the data; backup does not. Treat them as defense in depth, not substitutes – snapshots cover oops, backup covers ransomware. The deeper immutability story is in Backup vault immutability & cross-region restore.

The protection knobs and their sane defaults:

Setting Values Default When to change Gotcha
Share soft-delete retention 1–365 days Often off Always enable; 14–30d typical Off by default on older accounts
Snapshot frequency Manual or via Backup policy Manual Automate via Backup vault Manual snapshots get forgotten
Max snapshots per share 200 Beyond 200, oldest must be pruned
Backup policy schedule Hourly–daily Match RPO Vaulted tier needed for immutability
Backup vault immutability Disabled/locked Disabled Lock for ransomware posture Locked is irreversible – test first

Azure File Sync: cloud tiering and multi-site replication

Azure File Sync turns one or more Windows file servers into cached endpoints of an Azure file share. The hot working set stays on local NTFS; cold files become tiered pointers (reparse points) whose data lives only in Azure. This is the answer to “we have 40 TB on a branch file server but only touch 2 TB a month.”

# On each Windows Server file server, after installing the Azure File Sync agent:
Register-AzStorageSyncServer -ResourceGroupName "rg-filesync" -StorageSyncServiceName "sss-corp"

# Create a server endpoint with cloud tiering: keep ~20% free space locally,
# and tier anything not touched in 30 days.
New-AzStorageSyncServerEndpoint `
  -ResourceGroupName "rg-filesync" `
  -StorageSyncServiceName "sss-corp" `
  -SyncGroupName "sg-projects" `
  -ServerLocalPath "F:\Projects" `
  -CloudTiering $true `
  -VolumeFreeSpacePercent 20 `
  -TierFilesOlderThanDays 30

Add multiple server endpoints to the same sync group and you get multi-site replication: a server endpoint in each office, all converging on the same cloud share.

The File Sync settings that matter

Setting What it does Default / range When to change Gotcha
CloudTiering Enables tiering of cold files Off On for branch caches Off = full local copy of everything
VolumeFreeSpacePercent Keep this % of volume free 20% typical Raise if volume is small Tiers aggressively when low
TierFilesOlderThanDays Date policy for tiering 0 (disabled) 30–90d common Combine with free-space policy
Initial sync direction Authoritative source Cloud or server First onboarding Wrong direction can hide files
Recall on read Rehydrates a tiered file on access Implicit AV/backup reads recall everything

Two operational rules I enforce, as a checklist:

Rule Why Failure if ignored
Exclude tiered volumes from AV full scans (or use recall-on-read exclusions) A scan reads every file → recalls the whole dataset Egress bill spike + volume fills
Back up the cloud share, not the cached servers The cloud share is the source of truth Backing up tiered pointers backs up nothing
Don’t run server-side backup that recalls Same recall trap as AV Surprise rehydration
Keep agent versions current Old agents have tiering bugs Sync stalls, churn

ANF volumes, capacity pools, snapshots, and cross-region replication

Azure NetApp Files has its own hierarchy: a NetApp account, then one or more capacity pools (which set the service level and therefore throughput), then volumes carved from the pool.

resource "azurerm_netapp_account" "this" {
  name                = "anf-prod"
  resource_group_name = azurerm_resource_group.anf.name
  location            = "westeurope"

  # Bind ANF to AD so SMB volumes can do Kerberos against your domain
  active_directory {
    username            = var.ad_join_username
    password            = var.ad_join_password
    smb_server_name     = "ANFSMB"
    dns_servers         = ["10.10.0.4", "10.10.0.5"]
    domain              = "corp.contoso.com"
    organizational_unit = "OU=AzureNetApp,DC=corp,DC=contoso,DC=com"
  }
}

resource "azurerm_netapp_pool" "premium" {
  name                = "pool-premium-01"
  account_name        = azurerm_netapp_account.this.name
  resource_group_name = azurerm_resource_group.anf.name
  location            = azurerm_netapp_account.this.location
  service_level       = "Premium"
  size_in_tb          = 4
}

resource "azurerm_netapp_volume" "sap" {
  name                = "vol-sap-data"
  account_name        = azurerm_netapp_account.this.name
  pool_name           = azurerm_netapp_pool.premium.name
  resource_group_name = azurerm_resource_group.anf.name
  location            = azurerm_netapp_account.this.location

  volume_path        = "sap-data"
  service_level      = "Premium"
  subnet_id          = azurerm_subnet.anf_delegated.id   # MUST be delegated to Microsoft.NetApp/volumes
  storage_quota_in_gb = 2048
  protocols          = ["CIFS"]                          # SMB; use ["NFSv4.1"] or both for dual-protocol

  snapshot_directory_visible = true
}

The reusable module form is terraform-module-azure-netapp-files if you want this as a versioned building block.

The ANF gotchas, enumerated

Concern Requirement Why it bites
Delegated subnet Subnet delegated to Microsoft.NetApp/volumes Cannot host VMs/PEs; size it ahead
Protocol choice CIFS (SMB), NFSv3, NFSv4.1, or dual Dual-protocol needs LDAP + AD mapping
Snapshot cost Near-instant, storage-efficient But still consumes pool capacity over time
AD connection Per NetApp account One AD config shared by volumes in the account
Throughput quota_TiB × level_MiBps (manual) Small volume = throttled even on Ultra

ANF snapshots are near-instant and storage-efficient (no copy on create). For DR, cross-region replication mirrors a volume to a paired region on a schedule:

# Create the destination as a data-protection volume that replicates from the source,
# then authorize the source to replicate to it.
az netapp volume replication approve \
  --resource-group rg-anf-dr \
  --account-name anf-dr \
  --pool-name pool-premium-dr \
  --name vol-sap-data-dr \
  --remote-volume-resource-id "$SRC_VOL_ID"

az netapp volume replication status \
  --resource-group rg-anf-dr --account-name anf-dr \
  --pool-name pool-premium-dr --name vol-sap-data-dr \
  --query "mirrorState"   # expect: mirrored

Replication is one-directional until you break the peering to fail over, at which point the destination becomes writable.

Replication knob Values RPO impact Note
Replication schedule 10 min / hourly / daily Sets RPO 10-min is the tightest ANF offers
mirrorState mirrored, uninitialized, broken mirrored = healthy
Break peering Manual Destination becomes RW This is the failover action
Resync after failback Reverse + resync Re-establishes mirror Rehearse the full sequence

Rehearse the break-and-resync; do not discover the sequence during an incident.

Performance tuning, throughput provisioning, and monitoring

On Azure Files Premium v1, baseline IOPS and throughput scale linearly with provisioned size, plus a burst-credit pool. If you are throttled, the lever is provisioned GiB – or move to Premium SSD v2 and provision IOPS/throughput independently. On ANF, the levers are service level (Standard/Premium/Ultra) and volume quota.

Platform Throughput lever Online change? Extra knob
Files Premium v1 Provisioned GiB Yes Burst credits
Files Premium SSD v2 Provisioned IOPS + throughput (independent) Yes Decoupled from capacity
Files Standard Tier + transaction model Yes Burstable, IOPS not guaranteed
ANF Service level + volume quota Yes (level change online) SMB Multichannel / NFS nconnect

Two extra knobs for throughput-bound SMB and NFS workloads are SMB Multichannel and NFS nconnect, which fan a single mount across multiple TCP connections:

Knob Protocol What it does When to use
SMB Multichannel SMB 3.x Multiple TCP channels per mount Single-client high-throughput
NFS nconnect NFSv3/4.1 N TCP connections per mount Parallel NFS read/write
Larger client RSS queues Both More receive-side scaling Many-core clients
SMB 3.1.1 dialect SMB Best perf + AES-256 encryption Force latest; disable SMB1/2
Burst credits Files Premium v1 Short spikes above baseline Bursty workloads between peaks
Mount option cache= SMB (Linux) Client-side caching mode Tune for read-heavy mounts

Watch the right metrics in Azure Monitor. For Azure Files, throttling shows up as 429 responses, not slowness:

// Azure Files: catch throttling on the file share (server-side success vs throttle)
StorageFileLogs
| where TimeGenerated > ago(1h)
| where StatusCode == 429 or StatusText has "ThrottlingError"
| summarize Throttled = count() by bin(TimeGenerated, 5m), OperationName
| order by TimeGenerated desc

The metrics that actually tell you whether to provision more:

Platform Metric What it signals Action when high
Files Transactions (429s) Throttling Provision IOPS (v2) or GiB (v1)
Files SuccessE2ELatency Server-side latency Move to Premium / v2
Files FileCapacity Used vs provisioned Resize before full
ANF VolumeConsumedSizePercentage Quota pressure Grow the volume
ANF ReadLatency / WriteLatency Latency floor Raise service level
ANF ThroughputLimitReached Quota/level bound Grow quota or bump level

Architecture at a glance

The diagram traces the real data-and-identity path of an identity-based SMB mount, left to right, and marks the five steps that are either the key mechanism or the failure point. Read it as a pipeline. On the far left a domain-joined endpoint issues net use Z: with no account key; alongside it the identity source (on-prem AD DS or Entra Kerberos) mints a Kerberos service ticket for the cifs/<account> SPN – badge 1 is where a client falls back to NTLM or a key and the mount fails by design. That request then has to resolve a name: the Private DNS zone privatelink.file.core.windows.net must point at the private endpoint NIC so SMB rides TCP 445 over the private network – badge 2 is the classic “FQDN resolved to a public IP” failure. The ticket and the private route land on the SMB platforms zone: an Azure Files Premium v2 share or an ANF volume in its delegated subnet, both gated twice (share-RBAC to mount, then NTFS ACLs via icacls) – badge 3 is the two-gate denial where one gate says yes and the other says no.

From there the path turns into protection. The data-protection zone layers snapshots + soft delete (in-account, Previous Versions, oops-recovery) and a vaulted, immutable backup that lives off the account – badge 4 is the “snapshot is not a backup” trap, because in-account copies are purgeable by an attacker who owns the account. Finally the path extends to a paired region via ANF cross-region replication, mirrored one-way until you break the peering to make the destination writable – badge 5 is the replication that was never rehearsed as a failover. The whole picture is the method: authenticate with Kerberos, resolve private, pass both gates, protect in three layers, and replicate for the region loss.

Architecture of identity-based SMB on Azure Files and Azure NetApp Files. Left to right: a domain-joined endpoint plus an AD DS or Entra Kerberos identity source minting a cifs service ticket (badge 1, Kerberos vs NTLM fallback); a Private DNS privatelink.file zone resolving to a private endpoint on TCP 445 (badge 2, FQDN resolving to a public IP); the SMB platforms zone with an Azure Files Premium v2 share and an ANF volume in a delegated subnet, both behind the two-gate model of share-level RBAC then NTFS ACLs (badge 3, two-gate denial); a data-protection zone with in-account snapshots plus soft delete and an off-account immutable vaulted backup (badge 4, snapshot is not a backup); and a paired-region ANF cross-region replication target that is mirrored one-way until peering is broken (badge 5, failover not rehearsed). Flows are labeled resolve FQDN, SMB 445, snapshot, and replicate.

Real-world scenario

A pharma client ran Azure Virtual Desktop for 9,000 users with FSLogix profile containers on a single Azure Files Premium v1 share in West Europe. Every weekday at 08:30 the login storm hammered the share; users saw 90-second profile loads and intermittent “profile failed to attach.” Azure Monitor showed a wall of 429s during the storm – the share was IOPS-throttled, not latency-bound. The platform team was five engineers; the share was ~6 TiB of actual profile data.

The naive fix was to provision the v1 Premium share up to absorb peak IOPS, but that meant paying for ~30 TiB of provisioned capacity to buy IOPS they did not need on a dataset under 6 TiB. The cost delta was roughly 5x, and the manager balked.

They solved it two ways. First, they migrated profiles to Premium SSD v2, decoupling IOPS from capacity so they could provision peak IOPS against the actual ~6 TiB footprint. Second, they sharded profiles across multiple shares with FSLogix per-share assignment so the login storm fanned across independent IOPS budgets.

Identity was the part that almost derailed it. The endpoints were hybrid-joined, so they used Entra Kerberos – and the hard-won lesson was that NTFS ACLs still resolve against AD DS SIDs, so the AD accounts had to be synced via Entra Connect and the FSLogix container ACLs set to the synced identities, not cloud-only ones. For two days, half the pilot users got “profile failed to attach” purely because the container directory ACLs referenced cloud-only objects that had no matching AD SID. The fix was to re-ACL the FSLogix root to the synced security group.

# Premium SSD v2: provision IOPS and throughput independently of capacity
az storage account create \
  --resource-group rg-avd-profiles \
  --name stavdprofv2 \
  --sku PremiumV2_LRS \
  --kind FileStorage \
  --location westeurope

az storage share-rm create \
  --resource-group rg-avd-profiles \
  --storage-account stavdprofv2 \
  --name fslogix \
  --quota 6144 \
  --provisioned-iops 30000 \
  --provisioned-bandwidth-mibps 2048

Result: 08:30 profile loads dropped from ~90 seconds to under 6, the 429s disappeared, and the monthly storage line fell because they stopped buying capacity to rent IOPS. The deeper AVD/FSLogix architecture is in Azure Virtual Desktop at 5,000 users with FSLogix.

The incident as a timeline, because the order of moves is the lesson:

Time Symptom Action taken Effect What it should have been
Day 1 08:30 90s profile loads, “failed to attach” (tickets fire) Ask: throttled or latency-bound?
Day 1 Reflex Plan to provision v1 up to ~30 TiB 5x cost, manager balks Don’t buy capacity for IOPS
Day 1 Diagnosis Read 429 rate in StorageFileLogs Confirmed IOPS throttle The breakthrough
Day 2 Pilot rollout Move to Premium SSD v2, 30k IOPS Loads → <6s Correct fix
Day 2 New failure Half of pilot “failed to attach” FSLogix ACLs on cloud-only objects Entra Kerberos NTFS gotcha
Day 3 Resolved Re-ACL FSLogix root to synced group All attach The identity lesson
+1 wk Steady state Shard profiles across shares Login storm fanned out Spread the IOPS budget

Advantages and disadvantages

Managed SMB on Azure both removes the file-server toil and introduces new failure modes around identity and DNS. Weigh it honestly:

Advantages Disadvantages
No file server OS to patch; the platform owns availability The identity wiring (SPN, Kerberos key, synced SIDs) is fiddly and unfamiliar
Kerberos SSO means no credential prompts and no stored passwords A single DNS mistake silently falls back to NTLM or egresses public
Snapshots are near-instant and storage-efficient In-account snapshots are purgeable by an attacker who owns the account
Premium SSD v2 decouples IOPS from capacity, cutting waste Premium v1 (still common) forces you to buy capacity to rent IOPS
ANF gives sub-millisecond latency for SAP/HPC/EDA ANF’s 1 TiB pool floor + delegated subnet are real cost/network overhead
Two-gate model maps cleanly to “who can mount” vs “what they can do” Confusing the two gates is the #1 support ticket
Cross-region replication / GRS survive a region loss CRR is one-way and useless until you rehearse the break-and-resync
Azure File Sync caches a huge dataset on a small branch volume An AV/backup full scan recalls the whole tiered dataset and blows egress

Azure Files Premium is right for general-purpose shares, FSLogix, and File Sync hubs where a few milliseconds is fine and you want one resource to manage. ANF is right when the SLO mentions microseconds or it is SAP/HPC/EDA. The disadvantages are all manageable – but only if you know they exist, which is the point of this article.

Hands-on lab

Stand up identity-based SMB on Azure Files against on-prem AD DS, force it private, prove Kerberos won, and add data protection – then tear it down. You need a domain-joined VM in a VNet with line-of-sight to a DC. Run the az parts in Cloud Shell (Bash) and the PowerShell parts on the domain-joined VM.

Step 1 – Variables and resource group.

RG=rg-files-lab
LOC=westeurope
ST=stfileslab$RANDOM   # globally-unique, lowercase
az group create -n $RG -l $LOC -o table

Step 2 – Create a Premium FileStorage account and a share.

az storage account create -g $RG -n $ST \
  --sku Premium_LRS --kind FileStorage --location $LOC -o table
az storage share-rm create -g $RG --storage-account $ST --name projects --quota 100 -o table

Expected: an account with kind = FileStorage, and a 100 GiB share projects.

Step 3 – Join the account to AD DS (on the domain-joined VM, PowerShell).

Import-Module .\AzFilesHybrid.psd1
Connect-AzAccount
Join-AzStorageAccount -ResourceGroupName "rg-files-lab" -StorageAccountName "<ST>" `
  -SamAccountName "<ST>" -DomainAccountType "ComputerAccount"

Step 4 – Confirm the directory service is AD (not None).

az storage account show -g $RG -n $ST \
  --query "azureFilesIdentityBasedAuthentication.directoryServiceOptions" -o tsv
# expect: AD   (None means you are still on key-based auth)

Step 5 – Assign share-level RBAC to your AD group.

scope=$(az storage account show -g $RG -n $ST --query id -o tsv)/fileServices/default/fileshares/projects
az role assignment create --assignee "<entra-group-object-id>" \
  --role "Storage File Data SMB Share Contributor" --scope "$scope"

Step 6 – Force it private and disable public access.

az network private-endpoint create -g $RG -n pe-$ST-file \
  --vnet-name <vnet> --subnet <snet-privatelink> \
  --private-connection-resource-id "$(az storage account show -g $RG -n $ST --query id -o tsv)" \
  --group-id file --connection-name plsc-$ST-file
az storage account update -g $RG -n $ST --public-network-access Disabled

Step 7 – Mount with Kerberos and prove it (on the VM, PowerShell).

net use Z: \\<ST>.file.core.windows.net\projects
klist | Select-String "cifs/<ST>"            # a cifs ticket must appear
Resolve-DnsName <ST>.file.core.windows.net   # must return the PRIVATE IP

Step 8 – Add data protection.

az storage account file-service-properties update -g $RG --account-name $ST \
  --enable-delete-retention true --delete-retention-days 14
az storage share snapshot --account-name $ST --name projects --auth-mode login

Validation checklist. You created identity-based SMB, joined it to AD, scoped RBAC to the share, forced the data plane private, and confirmed Kerberos with an actual cifs/... ticket. Each step mapped to a real-world move:

Step What you did What it proves Real-world analogue
3 Join-AzStorageAccount The account has an AD object + Kerberos key Onboarding any new file share
4 directoryServiceOptions = AD You left key-based auth behind The “did identity actually take?” check
5 RBAC scoped to the share Gate one is set, narrowly Least-privilege mount rights
6 Private endpoint + disable public SMB is off the internet Hardening every prod account
7 klist shows cifs/... Kerberos won, not NTLM/key The 90-second proof during cutover
8 Soft delete + snapshot Oops-recovery exists Day-one data protection

Cleanup.

az group delete -n $RG --yes --no-wait
# Also remove the AD computer object created in step 3 if your OU cleanup doesn't.

Cost note. A 100 GiB Premium share is a few hundred rupees per month prorated; an hour of this lab is well under ₹100, and deleting the resource group stops the storage charges. Remember to delete the stray AD object.

Common mistakes & troubleshooting

This is the playbook – the part you bookmark. First as a scannable table you read mid-cutover, then the expanded reasoning for the entries that bite hardest.

# Symptom Root cause Confirm (exact cmd / portal path) Fix
1 Every mount prompts for credentials Identity source is None – still key-based az storage account show --query "...directoryServiceOptions" = None Join-AzStorageAccount (AD) or --enable-files-aadkerb
2 Mount falls back to NTLM / fails DNS resolves account to a public IP Resolve-DnsName <acct>.file.core.windows.net shows public IP Wire privatelink.file to the PE private IP
3 Mount prompts even with AD joined SPN missing on the AD object setspn -L <samaccountname> – no cifs SPN Re-run join; ensure cifs/<acct>.file.core.windows.net exists
4 No ticket issued at all Client can’t reach a DC for the TGS nltest /dsgetdc:<domain> fails Fix routing/NSG/firewall to a DC; check AD site/subnet
5 Can mount but write is denied RBAC ok, NTFS ACL denies icacls Z:\path shows no grant Grant Modify to the synced AD group at NTFS level
6 Mount works only with the account key Someone hard-coded the key in the script grep scripts for the key/connection string Remove the key; rely on Kerberos SSO
7 FSLogix “profile failed to attach” (Entra Kerberos) Container ACLs on cloud-only objects with no AD SID Inspect ACL on FSLogix root Re-ACL to the synced security group
8 Throttling 429 during login storm IOPS-bound on Premium v1 (capacity ≠ IOPS) StorageFileLogs StatusCode == 429 Move to Premium SSD v2; provision IOPS
9 On-prem mounts fail but VNet works On-prem DNS resolves account to public Resolve-DnsName on an on-prem host Forward privatelink.file to Private Resolver
10 ANF volume create fails Subnet not delegated to Microsoft.NetApp/volumes Subnet delegation blade empty Delegate a dedicated subnet (no other resources)
11 ANF volume throttled despite Ultra Volume quota too small for the throughput math ThroughputLimitReached high Grow quota (quota_TiB × level_MiBps)
12 “Recovered” file is gone after restore Only had in-account snapshots; attacker purged No vaulted backup policy Add Backup vault (immutable, off-account)
13 DR test: destination volume read-only CRR is one-way; peering not broken mirrorState = mirrored, never broke Break peering to make destination writable
14 File Sync server filled up / egress spike AV/backup full scan recalled tiered files Recall metrics spike Exclude tiered volume; recall-on-read exclusions

The exact error strings you see on a Windows client, decoded – because net use and Event Viewer speak in numbers:

Error string / code Meaning Likely cause Fix
System error 1326 (logon failure) Bad credentials / no Kerberos NTLM fallback, key mismatch Ensure Kerberos path + SPN
System error 53 (network path not found) Name didn’t resolve / 445 blocked DNS or firewall on 445 Fix private DNS; allow 445
System error 67 (network name not found) Share or FQDN wrong Typo or public-only access Verify share name + private DNS
System error 1219 (multiple connections) Conflicting creds to same server An old key-based mount lingers net use /delete the stale mount
System error 5 (access denied) NTFS or RBAC denies One of the two gates says no Grant RBAC + NTFS to the group
STATUS_ACCESS_DENIED (SMB) NTFS evaluation failed ACL doesn’t include the SID Re-ACL to the synced AD group
directoryServiceOptions: None Not identity-based Account never joined Join-AzStorageAccount / --enable-files-aadkerb
mirrorState: broken (ANF) Replication not healthy Peering broken / lagging Resync; check schedule

The expanded form for the entries that bite hardest:

1. Every mount prompts for credentials. Root cause: The account’s identity source is None – it is still on storage-key auth, so there is no Kerberos object to authenticate the domain user. Confirm: az storage account show -g <rg> -n <acct> --query "azureFilesIdentityBasedAuthentication.directoryServiceOptions" -o tsv returns None. Fix: Join-AzStorageAccount for AD DS, or az storage account update --enable-files-aadkerb true for Entra Kerberos. Re-check the field returns AD or AADKERB.

2. Mount falls back to NTLM or fails outright. Root cause: DNS resolves the account to a public IP, so Kerberos is rejected and SMB can’t reach the private path. Confirm: Resolve-DnsName <acct>.file.core.windows.net returns a public address instead of the PE private IP. Fix: Create the private endpoint on the file sub-resource and wire privatelink.file.core.windows.net to its private IP; on-prem, forward that zone to a Private Resolver inbound endpoint.

7. FSLogix “profile failed to attach” under Entra Kerberos. Root cause: The FSLogix container directory ACLs reference cloud-only objects with no matching AD DS SID, so NTFS evaluation fails even though Kerberos auth succeeded. Confirm: Inspect the ACL on the FSLogix root; the principals are cloud-only, not the synced AD group. Fix: Re-ACL the FSLogix root to the synced security group (an on-prem AD group synced via Entra Connect), and ensure the user accounts are synced.

8. Throttling (429) during the morning login storm. Root cause: The share is IOPS-bound on Premium v1, where IOPS scale with provisioned capacity – so you’re throttled despite plenty of free space. Confirm: StorageFileLogs | where StatusCode == 429 lights up during the storm; latency is fine between storms. Fix: Move to Premium SSD v2 and provision IOPS independently of capacity, or shard across shares; do not over-provision v1 capacity to rent IOPS.

12. The “recovered” file is gone after a restore. Root cause: You only had in-account snapshots/soft delete, which an attacker (or a malicious admin) with account rights purged along with the data. Confirm: There is no Backup vault policy; the only protection was shareDeleteRetentionPolicy and manual snapshots. Fix: Add a Backup vault with an immutable, off-account copy so ransomware can’t reach your last good copy.

Best practices

Security notes

The security controls and what each buys you:

Control Setting / mechanism Secures against Also prevents
Identity-based SMB directoryServiceOptions = AD/AADKERB Key-based, un-auditable access Credential prompts / NTLM fallback
Least-privilege RBAC SMB Share roles scoped to share Over-broad mount rights Accidental account-wide access
Private endpoint + no public --public-network-access Disabled Internet-exposed SMB 445 blocked / public egress
AES-256 Kerberos EncryptionType on the AD object RC4 downgrade attacks Weak-cipher handshakes
CMK at rest Customer-managed key Platform-key compliance gaps Regulatory findings
Immutable vaulted backup Backup vault (locked) Ransomware / malicious purge Loss of last good copy

Cost & sizing

The bill drivers and how they interact with the fixes:

A rough monthly picture, INR-leaning:

Cost driver What you pay for Rough INR / month What it fixes Watch-out
Files Premium v1 (provisioned) Provisioned GiB (IOPS scale with size) Capacity-driven; can balloon Predictable low latency Buying TiB to rent IOPS = waste
Files Premium SSD v2 GiB + IOPS + throughput independently Lower for bursty/small footprints IOPS without capacity waste Verify region availability
Files Standard Used GiB + transactions Cheapest GiB Bulk/infrequent shares Transaction costs on chatty apps
ANF Premium pool (1–4 TiB) Pool capacity × service level Premium floor is significant Sub-ms SAP/HPC/EDA 1 TiB floor billed even if unused
Vaulted backup Backup storage + ops Modest, justified Ransomware recovery Locked immutability is irreversible
Cross-region replication (ANF) Destination volume + transfer ~2x the source footprint Region-loss DR Pay for the DR copy continuously

Sizing rule of thumb: start Files Premium SSD v2 at the used footprint plus headroom, provision IOPS to your measured peak (FSLogix login storms are the spiky case), and only reach for ANF when a real microsecond-class SLO justifies the pool floor.

Interview & exam questions

1. When do you choose Azure NetApp Files over Azure Files? When the workload SLO is latency-critical (sub-millisecond), or it is SAP, HPC scratch, or large EDA/render – ANF gives consistent microsecond-class latency via bare-metal NetApp. Otherwise Azure Files Premium is simpler (one resource, no delegated subnet, no 1 TiB pool floor). Don’t pick ANF for general file shares just because it’s “faster on paper.”

2. What are the three SMB identity sources for Azure Files, and the key constraint of Entra Kerberos? On-prem AD DS, Microsoft Entra Kerberos, and Entra Domain Services. The constraint: Entra Kerberos authenticates the user, but NTFS permissions still resolve against AD DS SIDs, so you need those identities synced from AD DS to set fine-grained ACLs – critical for FSLogix/AVD.

3. Explain the two-gate access model. Gate one is share-level RBAC (Azure roles like Storage File Data SMB Share Contributor) deciding who can mount the share. Gate two is NTFS ACLs (icacls) deciding what you can do once mounted. Both must grant access; a user can pass RBAC and still be denied a write by NTFS, or vice-versa.

4. Why do mounts fall back to NTLM and fail, and how do you confirm Kerberos won? Usually DNS resolves the account to a public IP, so Kerberos is rejected. Confirm Kerberos with klist | Select-String "cifs/<account>.file.core.windows.net" – the cifs service ticket must be present – and Resolve-DnsName must return the private endpoint IP.

5. What does directoryServiceOptions tell you? It’s the field on the storage account (azureFilesIdentityBasedAuthentication.directoryServiceOptions) that reports the identity source: AD, AADKERB, AADDS, or None. None means you never achieved identity-based access and clients are using the storage key.

6. Why does the private DNS zone name differ from the account FQDN, and why does it matter? The account FQDN <acct>.file.core.windows.net CNAMEs to privatelink.file.core.windows.net, which your private zone resolves to the PE’s private IP. If on-prem can’t resolve privatelink.file, it gets the public IP – mounts fail or egress over the internet. You forward that zone via a Private Resolver.

7. What’s the difference between a snapshot, soft delete, and a vaulted backup? Snapshots are read-only point-in-time copies in the account (Previous Versions; oops-recovery). Soft delete keeps deleted shares/snapshots recoverable for a window (in-account). Vaulted backup stores an immutable copy off the account – the only one an attacker who owns the account can’t purge, hence your ransomware defense.

8. What is cloud tiering in Azure File Sync and its biggest operational trap? Cloud tiering keeps the hot working set on the local server and turns cold files into reparse-point pointers whose data lives only in Azure. The trap: an antivirus or backup full scan reads every file and recalls the entire tiered dataset, filling the volume and spiking egress – use recall-on-read exclusions and back up the cloud share, not the servers.

9. Why is an ANF volume’s throughput sometimes low even on the Ultra service level? On a manual-QoS pool, throughput is quota_TiB × service_level_MiBps. A small volume (e.g. 500 GiB) on Ultra still gets only a fraction of the per-TiB rate. The fix is to grow the volume quota, not just the service level.

10. Premium v1 vs Premium SSD v2 for Azure Files – what changed and why care? Premium v1 bills on provisioned capacity, with IOPS/throughput scaling with size – so you over-provision TiB to buy IOPS. Premium SSD v2 decouples capacity, IOPS, and throughput, letting you provision each independently. For spiky workloads like FSLogix it’s usually cheaper and faster.

11. ANF cross-region replication is configured and mirrorState is mirrored. Are you failover-ready? Not until you rehearse the break. Replication is one-directional; the destination is read-only until you break the peering, at which point it becomes writable. Rehearse the break-and-resync so you don’t discover the sequence during an incident.

12. A migrated file server mounts fine in Azure but on-prem clients prompt for credentials. Cause? On-prem DNS resolves the account to the public IP (not the private endpoint), so Kerberos is rejected. Fix by forwarding privatelink.file.core.windows.net from on-prem DNS to an Azure DNS Private Resolver inbound endpoint so on-prem resolves the private IP.

These map to AZ-104 (Administrator)configure Azure Files and Azure File Sync, identity-based access, and storage networking – AZ-700 (Network Engineer) – private endpoints and DNS – and AZ-500 (Security) – identity, encryption, and least privilege. A compact cert-mapping for revision:

Question theme Primary cert Exam objective area
Files vs ANF, tiers, sizing AZ-104 Configure storage; performance
Identity sources, AD/Kerberos AZ-104 / AZ-500 Identity-based access; secure storage
Two-gate RBAC + NTFS AZ-500 Authorize access to data
Private endpoint + DNS AZ-700 Private connectivity & name resolution
Snapshots / backup / CRR AZ-104 / AZ-305 Data protection & BCDR
File Sync cloud tiering AZ-104 Manage Azure File Sync

Quick check

  1. Your storage account’s directoryServiceOptions returns None. What does that mean for how clients are authenticating, and what’s the one-line fix path?
  2. A domain-joined client mounts the share but klist shows no cifs/... ticket, and Resolve-DnsName returns a public IP. What’s the root cause and the fix?
  3. Under Entra Kerberos, a user authenticates fine but FSLogix profiles “fail to attach.” What’s the most likely cause given that NTFS resolves against AD SIDs?
  4. True or false: an ANF volume on the Ultra service level always delivers maximum throughput regardless of its quota.
  5. You restored a deleted file from a snapshot, but after a ransomware event the snapshots were gone too. Which data-protection layer was missing?

Answers

  1. None means clients are using the storage account key, not identity-based auth – you never achieved Kerberos. Fix: Join-AzStorageAccount (AD DS) or az storage account update --enable-files-aadkerb true (Entra Kerberos), then confirm the field returns AD/AADKERB.
  2. DNS resolves the account to a public IP, so Kerberos is rejected and the mount falls back. Fix: create a private endpoint on the file sub-resource and wire privatelink.file.core.windows.net to its private IP (and forward that zone from on-prem via a Private Resolver).
  3. The FSLogix container directory ACLs reference cloud-only identities with no matching AD DS SID. Re-ACL the FSLogix root to the synced AD security group, and ensure the user accounts are synced via Entra Connect.
  4. False. On a manual-QoS pool, throughput is quota_TiB × service_level_MiBps; a small volume on Ultra is throughput-starved. Grow the volume quota, not just the service level.
  5. Vaulted (immutable) backup. Snapshots and soft delete live in the same account the attacker reached and were purged; an off-account immutable Backup vault is the copy ransomware can’t touch.

Glossary

Next steps

You can now choose a managed SMB platform, wire identity-based access without leaking a key, force the data plane private, and protect the data three ways. Build outward:

AzureAzure FilesAzure NetApp FilesStorageSMB
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments