Azure Cost Management

Azure Advisor for Cost: Acting on Rightsizing and Idle-Resource Recommendations

Your Azure bill is creeping up and you suspect waste, but you do not know where it is hiding. Somewhere in that subscription is a Standard_D8s_v5 VM sized for a launch that never came, a managed disk left behind when its VM was deleted, and an App Service plan with no apps still billing by the hour. Finding them by hand means clicking through hundreds of resources comparing CPU graphs — so the waste persists. Azure Advisor is the free, built-in service that does this hunt for you: it watches your resources, learns their real utilization, and produces a ranked list of concrete cost cuts on its Cost tab — resize this VM, shut down that one, delete this orphaned disk — each with an estimated monthly saving in your own currency.

The catch is that a list of recommendations is not money saved. Advisor tells you a VM is underutilized; it does not press the button. And some advice is wrong for your situation — a VM idle for seven days might be a disaster-recovery standby you must keep, or a batch box that only wakes at month-end. Acting well means understanding what Advisor actually measured, deciding resize versus shut down versus dismiss, and doing it safely. This article teaches all of that, end to end, in the portal and with az CLI and Bicep, so you turn the list into a lower invoice without breaking anything. It is the cheapest cost tool in Azure, and most teams never open it.

What problem this solves

Cloud waste is rarely one dramatic mistake. It is the steady accumulation of resources once right-sized and no longer: the VM provisioned three sizes too big “to be safe”, the test database left running over a weekend, the disk that outlived its VM, the plan emptied during a migration but never deleted. Each is individually small and invisible; together they can be 20-35% of an untended bill, and finding them by hand does not scale.

What breaks without Advisor is not an outage — it is money. Finance asks why the bill grew and the team has no answer, because the spend is spread across dozens of slightly-oversized resources that each look reasonable alone. The instinct is to guess — downsize something and hope — which either does nothing or causes an incident (downsized the busy one). Advisor replaces the guess with evidence: it has watched each resource for days and can tell you its 95th-percentile CPU never crossed a few percent.

It hits everyone running Azure beyond a trivial footprint, hardest on teams without a FinOps practice — dev/test sprawl, lift-and-shift migrations where on-prem VM sizes were copied verbatim, and “we’ll right-size it later” that became “we never did”. The fix is not a third-party tool — it is to open the Cost tab you already have and act on it on a schedule.

Here is what the Cost tab surfaces, what each recommendation means, and the first thing to check before acting:

Recommendation class What Advisor is saying Typical resource Verify before acting
Rightsize underutilized VM/VMSS “Bigger than its real load” Standard_D*/E* VM or scale set Low usage by design (DR, batch, seasonal)?
Shut down underutilized VM/VMSS “Barely used at all for 7 days” A near-idle VM Anything depend on it being up but quiet?
Delete unattached disk “Attached to no VM” microsoft.compute/disks Data still needed? Snapshot first
Delete empty App Service plan “Runs zero apps” microsoft.web/serverfarms A slot/app about to deploy onto it?
Buy a reservation / savings plan “Steady usage qualifies for a discount” Subscription-scoped Workload stable for 1-3 years?
Idle Cosmos DB container “No activity for 30 days” microsoft.documentdb/databaseaccounts Rarely-used but required?

Learning objectives

By the end of this article you can:

Prerequisites & where this fits

You need an Azure subscription with a resource to experiment on (the lab spins up a tiny VM and a disk you will delete), and the Azure CLI or Cloud Shell (the >_ icon in the portal — zero local setup). Be comfortable with a resource group (a folder holding related resources) and a VM size/SKU (like Standard_B2s — a name encoding vCPUs and memory). No prior cost-tooling knowledge is assumed.

On permissions: reading recommendations needs only Reader; acting needs write on the resource (Contributor on the VM to resize, on the disk to delete). Changing Advisor’s configuration needs subscription rights — if you lack them, the portal greys the option out.

Where this sits: Advisor is the detection layer. Azure Cost Management for Beginners: Budgets, Alerts and Cost Analysis tells you how much you spend; Advisor tells you what to cut. A Azure Tagging Strategy for cost allocation attributes each recommendation to a team, and Azure FinOps: Cost Management at Scale wraps Advisor into a practice across subscriptions. Advisor is broader than cost — it has four other categories — but this article is strictly the Cost tab.

A quick map of where Advisor fits among the tools you already have:

Tool Question it answers Relationship to Advisor
Cost Analysis “How much did I spend, sliced how?” Shows the spend Advisor helps cut
Budgets & alerts “Tell me when I cross a line” Independent; pairs well
Azure Advisor (Cost) “What specifically should I cut?” This article
Azure Policy “Stop bad resources being created” Enforces; Advisor recommends
Reservations / Savings Plans “Commit for a discount” Advisor recommends these

Core concepts

A few mental models make every recommendation obvious to read.

Advisor is a recommendation engine, not an enforcer. It reads metrics and configuration and emits advice with an estimated impact, but never changes a resource — you act, or wire automation to. Recommendations span five categories — Cost, Security, Reliability, Operational excellence, Performance — and this is only the Cost one.

Underutilization is measured percentiles, not a vibe. When Advisor calls a VM underutilized, it sampled its metrics (every 30 seconds, aggregated to 30-minute buckets) over a lookback window (7 days by default) and found them under thresholds — a statistical read over days, not one graph. Knowing those thresholds (below) tells you whether to trust each call.

Resize, shut down, and delete are different actions with different risk. A resize moves the VM to a cheaper SKU — the workload keeps running, but the VM reboots. A shutdown here means deallocate (stop and release the compute) — for a barely-used VM, offline until restarted. A delete (orphaned disk, empty plan) is irreversible, which is why you snapshot first. Picking the right action is the whole skill.

Stopping is not deallocating — and only one saves money. “Shut down” from inside the guest OS leaves the VM Stopped but still allocated, and compute keeps billing. Only Stopped (deallocated) — via az vm deallocate or the portal Stop button — stops the compute bill. Beginners shut down from inside Windows, the bill does not move, and they conclude Advisor lied — it did not, they stopped the wrong way.

Savings figures are honest estimates with two asterisks. They use public retail (pay-as-you-go) rates and ignore any reservations or savings plans you own. With a reservation covering that VM the real saving may be smaller — or a cross-series resize of a reserved VM can even raise effective cost. Treat the number as an upper-bound signal, not a figure to quote to finance verbatim.

How Advisor decides a VM is underutilized

This section turns Advisor from a black box into a tool you trust. It produces two VM cost recommendations — shut down and resize — each with a rule you can use to sanity-check the advice in seconds.

The shutdown rule — “barely used at all”

A shutdown recommendation appears when a VM was almost entirely idle across the window. Advisor looks at CPU and Outbound Network only — memory is ignored, because Microsoft found those two sufficient to identify a truly idle box. The triggers (all must hold over the default 7-day window):

Signal Threshold for “shut down” In plain English
P95 of max CPU (across all cores) < 3% Peak CPU is almost nothing
P100 of avg CPU, last 3 days (all cores) ≤ 2% Even the busiest moment averaged ~zero
Outbound Network utilization < 2% over 7 days Barely talking to anything

A VM that clears all three is doing essentially nothing — a forgotten test box, a decommissioned service nobody turned off. The right action is deallocate (if you might need it) or delete (if you are sure); the stated saving is the full compute cost.

The resize rule — “right-sized smaller”

A resize is subtler: the VM is doing work, but a cheaper SKU would carry it comfortably. Advisor uses CPU, Memory, and Outbound Network — memory matters now, to fit the load onto less hardware without starving it — then finds a target SKU that keeps headroom:

Headroom target on the recommended SKU User-facing workload Non-user-facing workload
P95 CPU and Outbound Network ≤ 40% ≤ 80%
P99 Memory ≤ 60% ≤ 80%

User-facing workloads (by CPU pattern) get more headroom so a spike does not peg the smaller box; batch can run hotter. The candidate SKU must also match on Accelerated Networking and Premium Storage, be available in-region, and be cheaper. Advisor crosses the family line to save — same-family (D4s_v5D2s_v5), newer version (D3v2D2v3), different family (D3v2E3v2), or a burstable B-series.

When the resize target is a burstable SKU

The B-series is Advisor’s favourite target for the “low average, occasional spike” VM. B-series VMs run at a reduced baseline CPU and bank credits while idle, spending them to burst — much cheaper than a SKU sized for the peak. Advisor recommends one only when the average CPU is under the burstable baseline, the P95 under twice it, the SKU does not use Accelerated Networking (B-series lacks it), and the banked credits would cover your spikes — that credit check is why a burstable recommendation is usually trustworthy. See Azure VM Series & Families explained (D, E, F, L, N, M) for the family map.

The lookback window — and why 7 days can mislead

By default Advisor analyzes the last 7 days — fine for steady workloads, misleading for periodic ones: a payroll VM idle for 27 days and slammed on the 28th looks “shut me down” all month, and acting would be a disaster. Widen the lookback so Advisor sees a full cycle. Available windows are 7, 14, 21, 30, 60, or 90 days; recommendations refresh within about 48 hours. For any monthly, seasonal, or batch workload, push it to 30 or 90 days before acting on a shutdown — though a longer window can also hide a recently idled VM, so it is a trade-off.

Idle and orphaned resources Advisor finds

Beyond VMs, the Cost tab surfaces pure waste — resources that cost money while doing nothing, the easiest wins because there is rarely a reason to keep them.

Unattached managed disks are the classic. When you delete a VM, its disks are not always deleted with it; the disk lingers, attached to no VM, billing for its full provisioned size — an orphaned 1 TB Premium SSD is real money for zero value. Advisor flags these under “Review disks that aren’t attached to a VM”, but deletion is irreversible, so the pattern is snapshot first, confirm, then delete. Azure VM Disk Types explained (Standard, Premium, Ultra) covers what each costs.

Empty App Service plans are next. A plan is the VMs you rent to run web apps, billed whether or not any app runs; during a migration teams delete the apps but forget the plan. Advisor flags “Unused/Empty App Service plan” — delete it after confirming no app or slot is about to land on it (pricing in Azure App Service Plans & Tiers explained).

Reservations and savings plans are a different shape: for steady usage Advisor suggests a 1- or 3-year commitment that discounts pay-as-you-go. It does not shrink anything, it asks you to pre-pay — so commit only for workloads stable for the term, and right-size first, or you lock in the waste.

Here is how the main idle/commitment recommendations differ and what to check before acting:

Recommendation What it means The action Reversible? Verify first
Unattached disk Bound to no VM Snapshot, then delete No Data truly unneeded?
Empty App Service plan Zero apps Delete the plan No No app/slot about to deploy?
Idle Cosmos DB container No activity 30 days Lower throughput or delete Throughput yes; delete no Rarely-used but required?
Buy a reservation Steady usage qualifies Commit 1 or 3 years Limited Workload stable for the term?
Buy a savings plan Steady compute spend Commit hourly $ for 1-3 yrs Limited Right-sized first?

Acting on a recommendation: resize vs shut down vs dismiss

Reading a recommendation is half the job; choosing the response is the other half. Every recommendation offers the same responses, and picking the right one separates a saved rupee from an incident.

Response What it does Use it when Caution
Resize Move VM to a cheaper SKU Works but is oversized Reboots; verify allowed sizes
Shut down (deallocate) Stop + release compute Near-idle Offline until restarted
Delete Remove the resource Orphaned disk / empty plan Irreversible — snapshot first
Postpone Hide for a period “Revisit in 30 days” Comes back; not a fix
Dismiss Hide indefinitely Low utilization by design You stop seeing a real cost too

Dismiss is a feature, not a cop-out — for the genuine reasons Microsoft lists: the VM is pre-provisioned for upcoming traffic, relies on metrics Advisor doesn’t see (GPU, local IO), is kept on a SKU for testing, must stay homogeneous with the fleet, or is a disaster-recovery standby that must stay idle-but-ready. There, dismissing is correct. What you must not do is dismiss because resizing is inconvenient — that is how waste becomes permanent.

The whole decision flow: low usage by design? Yes → dismiss with a note. No → doing real work? Essentially none → deallocate (delete if dead); some, on too-big a box → resize in a maintenance window.

Architecture at a glance

Hold this mental model and every step in the lab makes sense. Three layers. At the bottom, your live resources — VMs, scale sets, disks, plans — emit platform metrics (CPU, memory, network) plus configuration, like whether a disk is attached to anything, sampled every 30 seconds. In the middle sits Advisor’s analysis engine: on a recurring cadence it reads each resource’s metrics across the lookback window, runs the percentile rules above, cross-references retail pricing, and writes recommendation objects — one per qualifying resource, carrying the resource ID, action, and estimated saving. This engine is read-only; it observes and advises, never touching your resources.

At the top is where you live — the Cost tab, az advisor recommendation list, or the REST/ARM API. You read the recommendations and take the action: a resize, deallocate, disk delete, reservation purchase. The arrow from Advisor to your resources is not automatic — Advisor proposes, you dispose. Configuration changes (lookback, CPU filter) flow back into the engine and change next cycle’s output. The loop is: resources emit → Advisor analyzes → you act → resources change → next cycle re-measures. Your job is to close that loop on a schedule instead of letting recommendations pile up unread.

Real-world scenario

Northwind Retail runs a mid-sized e-commerce platform on Azure in Central India. Eighteen months of “ship fast, clean up later” left a production subscription with 140-odd resources and a bill drifting from ₹6.8 lakh to ₹9.2 lakh with no obvious cause — no new product, no traffic surge. Finance asked the platform lead, Anjali, where is the extra ₹2.4 lakh going? She did not know either.

She opened the Advisor Cost tab for the first time: 31 cost recommendations, combined estimated saving about ₹1.9 lakh/month. The big three: nine VMs flagged for resize (mostly Standard_D8s_v5 nodes copied from an over-provisioned on-prem spec, at 6-9% P95 CPU), four for shutdown (old staging boxes untouched for weeks), and eleven unattached disks from a Kubernetes migration, including two 2 TB Premium SSDs at ~₹18,000/month each.

She did not act blindly. For the four “shutdown” VMs she first widened the lookback to 30 days — confirming they were dead, not batch boxes — then deallocated three and deleted one. Of the nine resize candidates, seven were genuinely oversized and got resized (D8s_v5D4s_v5, two to B-series), but two she dismissed — the Black Friday buffer, pre-scaled for a sale six weeks out, the textbook “provisioned for upcoming traffic” case. For the eleven orphaned disks she snapshotted the two large ones, confirmed the rest were dead, and deleted all eleven.

Over the next cycle the bill dropped from ₹9.2 lakh to about ₹7.4 lakh — roughly ₹1.8 lakh/month recovered, close to Advisor’s estimate but not identical, because three resized VMs were covered by an existing reservation, so their real saving was smaller than the retail-rate figure (the caveat to expect). Anjali set the rightsizing CPU filter to 10% and put a monthly “review Advisor” reminder on her calendar. The lesson: the waste had been visible the whole time in a free tool nobody had opened, and acting carefully — not blindly — turned a list into ₹21 lakh a year.

Advantages and disadvantages

Advantages Disadvantages
Free, built in — ships with every subscription Read-only — you must act (or build automation)
Evidence-based — real multi-day percentiles Default 7-day lookback misleads on periodic workloads
Estimates a concrete saving per recommendation Savings are retail-rate, ignore reservations — can overstate
Covers the big wins: oversized VMs, orphaned disks, empty plans Doesn’t see every resource type or form of waste
Tunable per subscription (CPU filter, lookback) Acting blindly can cause outages (DR, spike buffers)
Available via portal, CLI, PowerShell, REST/ARM, workbook Refresh lags config changes by 24-48h

These matter differently by maturity. For a small team starting on cost discipline the free, evidence-based, concrete-saving advantages are transformative — “no idea” to a ranked action list in five minutes. As you scale, the disadvantages become things you engineer around: widen lookbacks for batch fleets, reconcile savings against reservations, and treat the list as a well-informed first draft, not gospel. Advisor is excellent at finding waste and only as good as your judgment at acting on it.

Hands-on lab

This is the heart of the article. You will create a wasteful pair of resources, then action a rightsizing and an idle-disk cleanup safely — in the portal and az CLI — plus a Bicep snapshot pattern, with validation at each step and a full teardown. It is free-tier-friendly: a tiny B1s VM and a small Standard disk you delete.

Caveat: Advisor needs days of telemetry before it generates a real rightsizing recommendation, so the lab won’t produce one in ten minutes. Instead you (a) read whatever recommendations exist, (b) perform the exact actions one asks for with the right safety checks, and © configure the rules — that skill is the point; the recommendation is just the trigger.

Step 0 — Prerequisites and variables

Open Cloud Shell (Bash) or a local terminal with the Azure CLI, confirm you are logged in, and set variables.

az account show --query "{sub:name, id:id}" -o table   # confirm the right subscription
az upgrade --yes 2>/dev/null                            # ensure a recent CLI (optional)

RG=rg-advisor-lab
LOC=centralindia
VM=vm-advisor-demo

Expected output: your subscription name and ID. If it errors with “Please run az login”, run az login first.

Step 1 — Create a resource group and a small VM

az group create --name $RG --location $LOC -o table

az vm create \
  --resource-group $RG \
  --name $VM \
  --image Ubuntu2204 \
  --size Standard_B1s \
  --admin-username azureuser \
  --generate-ssh-keys \
  --public-ip-sku Standard \
  -o table

Expected output: az group create returns "provisioningState": "Succeeded". az vm create takes 1-2 minutes and prints JSON with a publicIpAddress and "powerState": "VM running". You now have a running VM — and, for later, an OS managed disk attached to it.

Validate the VM and capture its current size:

az vm show -g $RG -n $VM --query "{name:name, size:hardwareProfile.vmSize, state:provisioningState}" -o table
az vm get-instance-view -g $RG -n $VM --query "instanceView.statuses[?starts_with(code,'PowerState')].displayStatus" -o tsv

Expect size Standard_B1s and power state VM running.

Step 2 — Read existing Advisor cost recommendations (portal)

  1. In the portal search bar, type Advisor and open it; in the left menu select Cost.
  2. Each row shows the recommendation (e.g. Right-size or shutdown underutilized virtual machines), the impact (High/Medium/Low), the resource(s), and a potential savings figure.
  3. Click one to open it — note the affected resources, the recommended action per resource (resize to a target SKU, or shut down), and the Postpone/Dismiss buttons.

Expected result: you can identify, for one resource, the exact action proposed and the money attached. A brand-new or tiny subscription may show no cost recommendations yet — normal, not an error.

Step 3 — Read the same recommendations via az CLI

The CLI is how you script reviews. Generate results, then list Cost:

# The 'advisor' commands are part of the core CLI; this generates/refreshes results
az advisor recommendation generate
sleep 5

# List only Cost-category recommendations, projected to the useful fields
az advisor recommendation list --category Cost \
  --query "[].{resource:impactedValue, problem:shortDescription.problem, impact:impact}" -o table

Expected output: a table of cost recommendations (empty if you have none). impactedValue is the resource, shortDescription.problem the summary (e.g. “Right-size or shutdown underutilized virtual machines”), impact is High/Medium/Low.

To see the estimated savings and target, drill into one recommendation’s raw object:

az advisor recommendation list --category Cost \
  --query "[0].extendedProperties" -o json

Expected output: a JSON bag of properties — for a rightsizing recommendation, fields like savingsAmount, savingsCurrency, annualSavingsAmount, targetSku, and regionId (the same numbers the portal shows). If you have zero cost recommendations it returns null — skip to Step 4.

Step 4 — Action a rightsizing safely (the pre-flight checks)

Suppose a recommendation says resize vm-advisor-demo to a smaller SKU. Before any resize, run the checklist — a resize reboots the VM, and not every SKU is on the current cluster.

4a. Check sizes the VM can move to in place:

az vm list-vm-resize-options -g $RG -n $VM --query "[].name" -o table

Expected output: SKUs the VM can resize to in place. If your target is not listed, you must deallocate first — a bigger maintenance action.

4b. Confirm the target SKU isn’t region-restricted:

az vm list-skus --location $LOC --size Standard_B1 --query "[].{name:name, restrictions:restrictions[].reasonCode}" -o table

Expected output: the region’s B-series sizes with an empty [] restrictions column for available ones. A NotAvailableForSubscription reason means pick another SKU.

Step 5 — Perform the resize (CLI and portal)

For the lab, resize to Standard_B1ms purely to exercise the operation (in production you resize down per the recommendation).

# This REBOOTS the VM. In production, do it in a maintenance window.
az vm resize -g $RG -n $VM --size Standard_B1ms -o table

Expected output: runs ~30-90 seconds and returns VM JSON with "vmSize": "Standard_B1ms"; the VM restarts as part of it.

Portal equivalent: VM → Size → pick the new size → Resize; the portal warns it will restart the VM.

Validate the new size and a healthy restart:

az vm show -g $RG -n $VM --query "hardwareProfile.vmSize" -o tsv      # → Standard_B1ms
az vm get-instance-view -g $RG -n $VM \
  --query "instanceView.statuses[?starts_with(code,'PowerState')].displayStatus" -o tsv   # → VM running

Both must reflect the change before you call it done; in production you’d then watch the app’s metrics for a day to confirm the smaller SKU carries the load.

Step 6 — Create and clean up an idle disk (the safe delete pattern)

Now the idle-resource workflow: create an empty managed disk attached to nothing — the orphan Advisor flags — and delete it safely.

DISK=disk-orphan-demo
az disk create -g $RG -n $DISK --size-gb 8 --sku Standard_LRS -o table

Expected output: a disk with "diskState": "Unattached" — attached to no VM, it bills for its 8 GB every month for zero value, a miniature of the real problem.

Confirm it is unattached — never delete a disk showing Attached:

az disk show -g $RG -n $DISK --query "{name:name, state:diskState, managedBy:managedBy}" -o table

Expected output: diskState = Unattached, managedBy = blank/null. A populated managedBy means a VM owns it — stop and investigate.

Snapshot first — the non-negotiable step before deleting any disk that might hold data:

az snapshot create -g $RG -n ${DISK}-snap --source $DISK --sku Standard_LRS -o table

Expected output: a snapshot with "provisioningState": "Succeeded". It costs pennies and is your insurance — for a real data disk, it lets you delete with confidence.

Now delete the orphaned disk and confirm it is gone:

az disk delete -g $RG -n $DISK --yes -o table
az disk list -g $RG --query "[].name" -o table   # disk-orphan-demo should be absent

Portal equivalent: Disks → select the unattached disk → confirm Disk state: Unattached → optionally Create snapshotDelete.

Step 7 — Bicep: the snapshot-before-delete pattern as code

For repeatable cleanups, express the safe half — the snapshot — as code, then delete the source disk via pipeline. Save as snapshot.bicep:

@description('Resource ID of the unattached disk to snapshot before deletion')
param sourceDiskId string

@description('Location for the snapshot')
param location string = resourceGroup().location

resource snap 'Microsoft.Compute/snapshots@2023-10-02' = {
  name: 'orphan-disk-snapshot'
  location: location
  sku: {
    name: 'Standard_LRS'   // cheap, redundant-enough for a safety snapshot
  }
  properties: {
    creationData: {
      createOption: 'Copy'
      sourceResourceId: sourceDiskId
    }
  }
}

output snapshotId string = snap.id

Deploy it, passing the disk ID to protect:

DISK_ID=$(az disk show -g $RG -n some-orphan-disk --query id -o tsv)
az deployment group create -g $RG -f snapshot.bicep -p sourceDiskId="$DISK_ID" -o table

Expected output: "provisioningState": "Succeeded" and a snapshotId. A pipeline can now safely az disk delete the source. (Skip the deploy if you already removed the demo disk in Step 6 — this shows the production pattern.)

Step 8 — Configure Advisor’s VM rightsizing rule (portal)

Tune Advisor to stop flagging boxes you keep on purpose.

  1. Open AdvisorConfiguration (it opens on the Resources tab).
  2. Select the VM/Virtual Machine Scale Sets right sizing tab.
  3. Tick the subscription(s) to tune, then click Edit.
  4. Set the Average CPU utilization filter — e.g. 10% — so only VMs under it are surfaced, and Apply.

Expected result: a saved-settings confirmation; the page notes it can take up to 24 hours to reflect. This filter changes which recommendations you see, not how they are computed. To change the lookback window (e.g. 30 days for a batch-heavy subscription), use the same settings; recommendations refresh within about 48 hours.

Step 9 — Teardown

Remove everything so the lab costs nothing ongoing. Deleting the resource group takes the VM, disks, public IP, NIC, and any leftover snapshot.

az group delete --name $RG --yes --no-wait

Expected output: the command returns immediately (--no-wait) and the group deletes in the background. Confirm after a minute or two:

az group exists --name $RG    # → false once teardown completes

Validate no stragglers remain: in Cost Analysis, filter to rg-advisor-lab over the last day — it should trend to zero. Delete any snapshot outside this group with az snapshot delete.

You have now read recommendations two ways, performed the resize and idle-cleanup with pre- and post-flight checks, captured the snapshot pattern in Bicep, tuned the rules, and cleaned up — the loop you repeat on real recommendations every month.

Common mistakes & troubleshooting

The failure modes that turn a cost-cleanup into an incident or a wasted afternoon. Symptom → root cause → confirm → fix.

# Symptom Root cause Confirm (exact check) Fix
1 Shut down VM, bill did not drop Stopped but still allocated (stopped from inside the OS) get-instance-view shows VM stopped, not VM deallocated az vm deallocate -g <rg> -n <vm>
2 Resized per Advisor, app erroring under load Acted without checking it was a spike buffer / DR box App metrics post-resize; was low usage by design? Resize back up; dismiss with a note
3 Deleted a disk, lost needed data Deleted an orphaned disk without snapshotting first The disk is gone; no snapshot exists Restore from snapshot if any; always snapshot first
4 Monthly batch VM flagged “shut down” 7-day lookback misses the month-end spike VM idle 27/30 days Widen lookback to 30/90 days
5 Saving did not materialize fully Estimate is retail-rate, ignores your reservation Active RI/savings plan covers the VM Reconcile against reservations before reporting
6 No cost recommendations at all Subscription too new/small, or scope filtered Reader on wrong scope; resources <7 days old Wait for data; check the subscription filter
7 Resize fails: “size not available” Target SKU not on the VM’s cluster/region Absent from az vm list-vm-resize-options Deallocate first, or pick an available SKU
8 Can’t change Advisor rules — greyed out Insufficient permissions on the subscription Configuration control is disabled Get the required subscription role, retry
9 Recommendation reappears after “fixing” You postponed instead of acting Returns after the postpone window Action it properly, or dismiss if intentional
10 Deleted an “empty” plan, a deploy broke A slot/app was about to deploy onto it Pipeline targets the deleted plan Recreate; verify zero intended apps first

The biggest is #1 — shutting a VM down the wrong way and concluding cost tooling is broken; only Stopped (deallocated) stops the compute bill. The next is #2/#3 — acting without context or a snapshot. Advisor gives the signal; you supply the judgment and the safety net.

Best practices

Security notes

Cost cleanup touches real resources, so apply least-privilege discipline:

Cost & sizing

Advisor itself is free; the cost story is what it helps you save, minus the trivial cost of safety. Rightsizing saves the price difference between SKUs (D8s_v5D4s_v5 roughly halves the compute; a B-series move saves more for spiky-but-low load). Deallocating a near-idle VM saves its entire compute cost — but you still pay for its disks until you delete them. Deleting an orphaned disk saves its full provisioned charge (Premium SSDs bill on provisioned, not used, size); deleting an empty plan saves its tier rate. The one cost you add — a snapshot — is a few rupees a month, worth it every time.

A rough picture of typical monthly savings per action (INR; varies by SKU, region, discounts):

Action What you save Rough monthly saving (INR) Watch-out
Resize D8s_v5D4s_v5 Half the VM compute ~₹15,000-20,000/VM Reboots; verify load fits
Resize to B-series burstable Big drop for spiky-low load ~₹10,000-25,000/VM Only if credits cover spikes
Deallocate a near-idle VM Full VM compute ~₹8,000-40,000/VM Disks still bill until deleted
Delete 1 TB orphaned Premium disk Full disk cost ~₹9,000-10,000/disk Snapshot first (pennies)
Delete empty App Service plan Full plan tier rate ~₹4,000-15,000/plan Confirm zero apps/slots

The headline: do not pay Advisor — let it pay you. The biggest mistake is leaving the recommendations unread; even acting on just the orphaned disks and empty plans (the zero-risk wins) pays back an engineer’s afternoon many times over.

Interview & exam questions

1. What is Azure Advisor, and which category does cost optimization live under? A free, built-in service that analyzes resources and recommends improvements across five categories — Cost, Security, Reliability, Operational excellence, Performance. Cost recommendations live on the Cost tab.

2. What does Advisor measure to call a VM “underutilized”, and over what window? For shutdown, CPU and Outbound Network over a default 7-day lookback (roughly P95 max CPU < 3%, network < 2%); for resize it adds Memory and targets headroom (P95 CPU ≤ 40% on the new SKU for user-facing). Lookback is configurable to 7/14/21/30/60/90 days.

3. A VM is idle 27 days a month, busy on day 28. Default recommendation, and the fix? The default 7-day lookback recommends shutdown — wrongly, since it can’t see the month-end spike. Widen the lookback to 30 or 90 days; do not act on the 7-day view.

4. “Stopped” vs “Stopped (deallocated)”, and why it matters? Stopped from inside the OS is Stopped but still allocated — compute keeps billing. Stopped (deallocated), via az vm deallocate or the portal Stop button, releases compute so billing stops. A “shut down” recommendation means deallocate.

5. Advisor says ₹50,000/month but the bill drops ₹30,000. Why? Savings use retail rates and ignore reservations/savings plans you own; if a commitment covered that VM the real saving is smaller. Reconcile before reporting.

6. When is it correct to dismiss a rightsizing recommendation? When low utilization is by design: pre-provisioned for upcoming traffic, a DR standby, using metrics Advisor doesn’t see (GPU, local IO), kept on a SKU for testing, or needing homogeneous fleet SKUs. Dismiss with a recorded reason — not because acting is inconvenient.

7. Safe procedure for an “unattached disk” recommendation? Confirm it is Unattached (diskState, blank managedBy), snapshot it, verify the data is unneeded, then delete — deletion is irreversible. Never delete a disk showing Attached.

8. Two checks before resizing a production VM? (a) az vm list-vm-resize-options — is the target on the VM’s current cluster (else deallocate first); (b) az vm list-skus — is it restricted in the region. And a resize reboots the VM.

9. How do you reduce noisy VM recommendations across a subscription? In Advisor → Configuration → VM/VMSS right sizing, set a per-subscription average-CPU filter (e.g. 10%) and adjust the lookback for batch subscriptions. Changes take ~24h (filter) to ~48h (lookback).

10. Advisor recommends a reservation. What first, and the risk of acting blindly? Right-size first — a reservation on an oversized VM locks in the waste. Commit (1/3 years) only for workloads stable for the term; blindly, you risk paying upfront for capacity you’ll stop using.

11. Which role lets someone review Advisor without changing resources? Reader on the scope sees all recommendations but cannot act. Resizing/deleting needs Contributor on the resource; changing Advisor’s configuration needs subscription rights, or the portal greys it out.

These map most directly to AZ-900 (Azure Fundamentals)describe cost management tools including Azure Advisor — and to the cost-optimization and operational-excellence pillars of the Azure Well-Architected Framework. The hands-on resize/deallocate/disk operations also appear in AZ-104 (Administrator) under managing VMs and cost. A compact cert map:

Question theme Primary cert Objective area
What Advisor is, its categories, cost tools AZ-900 Describe cost management and governance
Rightsizing rules, lookback, savings caveats AZ-900 / WAF Cost optimization pillar
Resize / deallocate / disk cleanup operations AZ-104 Manage VMs; optimize cost
Reservations vs savings plans AZ-900 / AZ-104 Cost management; commitments

Quick check

  1. Advisor flags a VM for shutdown. Which two metric types did it analyze, and over what default window?
  2. You “shut down” a VM from inside Windows and the bill doesn’t move. Why, and what state actually stops the compute charge?
  3. A VM is idle 27 days a month and busy on day 28. What’s the default recommendation, and the one configuration change that fixes the false positive?
  4. Before deleting an “unattached disk” recommendation, what is the one safety step you must take — and why?
  5. Advisor estimates ₹50,000/month savings but you only see ₹30,000. Give the most likely reason.

Answers

  1. CPU and Outbound Network (memory is not used for shutdown), over the default 7-day lookback (configurable up to 90 days).
  2. The VM is Stopped but still allocated, so compute keeps billing. Only Stopped (deallocated) — via az vm deallocate or the portal Stop button — stops the charge.
  3. It recommends shutdown (the 7-day window can’t see the month-end spike). Fix by widening the lookback to 30 or 90 days so it sees a full cycle.
  4. Snapshot the disk first — deletion is irreversible, so the snapshot is cheap insurance if the data turns out to be needed. Also confirm diskState is Unattached.
  5. Savings are at retail rates and ignore your existing reservations/savings plans — if a commitment covered that VM, the real saving is smaller.

Glossary

Next steps

You can now open Advisor’s Cost tab, understand why each recommendation appeared, and action it safely. Build outward:

AzureAzure AdvisorCost OptimizationRightsizingFinOpsIdle ResourcesCost ManagementBeginner
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading