Azure Lesson 135 of 137

Capstone: Design & Build a Production-Ready Azure Landing Zone

This is the capstone of the Azure Zero-to-Hero course. Everything you have learned so far — what a subscription is, how to drive Azure from the CLI, how Microsoft Entra ID and RBAC work — now comes together into one project: designing and building a production-ready Azure landing zone. A landing zone is the pre-built, governed environment that your applications “land” in. Networking, identity, policy guardrails, monitoring, and cost controls are wired up first, so that when an application team shows up, they inherit security and consistency on day one instead of re-inventing it (and getting it wrong) on every project.

We will work the way a real platform team does: start from a business brief, make explicit design decisions, then build in stages — each stage reusing a deeper lesson from the course so you always know where to go for detail. Crucially, this is not a tour of prose. A landing zone is a dense lattice of choices — eight design areas, dozens of policy effects, a handful of CIDRs and SKUs and role IDs that must line up exactly — and the way a senior architect actually holds it is as a set of reference tables they scan under pressure. So this capstone is table-first: every design area carries an option matrix, every failure mode a symptom-cause-confirm-fix row, every tier a side-by-side grid. Read the prose once to understand the why; keep the tables open while you build.

You will end with a small but genuinely real landing zone running in your own free-tier subscription, a set of acceptance criteria to prove it works, and a self-assessment rubric to grade yourself the way an Azure Review Board would. By the end you will be able to walk into a design review, sketch the management-group tree on a whiteboard, justify every subscription boundary, and know — to the exact az command — how to confirm each piece is wired correctly.

What problem this solves

The pain this prevents is the hand-built subscription that nobody can govern. One engineer clicks together a subscription two years ago; resources accrete with no tags, public IPs sprout on NICs, secrets land in plaintext app settings, monitoring is whatever each team remembered to turn on, and the bill is one undifferentiated number Finance cannot attribute to anyone. When the company grows to ten teams, there is no way to apply a security rule everywhere at once, no way to give a new team an isolated environment without a week of ticket-driven networking, and no way to answer “who spent this?” The landing zone is the antidote: governance, connectivity, identity, and observability are provisioned and inherited before the first workload arrives, so consistency is the default and drift is the exception.

What breaks without it: every new project re-implements (and re-mis-implements) networking and security; a misconfiguration in one workload can reach another because there is no blast-radius boundary; a compliance auditor asks “show me that no resource is public” and the only honest answer is “we’d have to check each one by hand.” Who hits it: every organization past its first few subscriptions — startups scaling to a platform team, enterprises consolidating shadow-IT subscriptions, and anyone preparing for AZ-305, where landing-zone design is the spine of the exam.

To frame the whole field before the deep dive, here are the eight Cloud Adoption Framework (CAF) design areas this capstone builds, the question each answers, the primary Azure construct, and the failure you get if you skip it:

Design area The question it answers Primary construct Failure if skipped
Resource organization Where does everything live and inherit from? Management groups + subscriptions Flat tenant; cannot govern at scale
Identity & access Who can do what, and how is it granted? Entra ID, RBAC, PIM Standing admins; per-user sprawl
Network topology How do workloads connect and stay isolated? Hub-spoke VNets, peering, UDR Flat network; no security boundary
Governance How are rules enforced, not just audited? Azure Policy initiatives Drift; rules nobody applies twice
Management / monitoring Where does telemetry go and how is it queried? Central Log Analytics, Defender Blind ops; no cross-estate hunting
Security baseline What is the default posture for every workload? Defender for Cloud, Zero Trust Inconsistent, weakest-link security
Cost management How is spend attributed and capped? Tags, budgets, Cost Management Mystery bill; no per-team chargeback
Platform automation / DevOps How is all of the above shipped reproducibly? Bicep/Terraform + CI/CD Portal drift; not reproducible

Learning objectives

By the end of this capstone you can:

Prerequisites & where this fits

This is the final, Advanced lesson of the Azure Zero-to-Hero course and it assumes the whole course. You should be comfortable with the account model (tenant → management group → subscription → resource group → resource), driving Azure from Cloud Shell with the az CLI, reading JSON output, and the basics of Microsoft Entra ID and RBAC. If any of those feel shaky, work through the earlier lessons first — this capstone links back to them at each stage rather than re-teaching them. The deeper landing-zone series carries each pillar to full production depth: start from Designing an Azure Landing Zone with the Cloud Adoption Framework for the end-to-end blueprint.

Here is the scope boundary stated plainly — what this capstone builds versus what it defers to the deep-dive lessons, so you know where the edges are:

Topic In this capstone Deferred to Why
MG hierarchy + sub layout Yes (design + lab via MG) Resource organization Real sub vending needs enrollment
RBAC to groups, PIM Design + reasoning Identity & access PIM needs Entra ID P2
Hub-spoke, peering, UDR Yes (built in lab) Network topology Firewall SKU has hourly cost
Policy require-tags / deny / DINE Yes (require-tags in lab) Governance DINE remediation needs identity setup
Central Log Analytics Yes (built in lab) Governance Ingestion billing at scale
Defender for Cloud plans Design + reasoning Security baseline Per-resource pricing
Budgets + cost attribution Design + exercise FinOps & cost engineering Chargeback model is org-specific
Ship it all as code Representative IaC Policy as code Full pipeline is its own lesson

Core concepts

Five mental models make every later decision obvious.

Inheritance is the whole point of a hierarchy. A management group (MG) is a container above subscriptions. A policy or RBAC assignment placed on an MG flows down to every subscription beneath it — including subscriptions that do not exist yet. This single property is why you organize at all: assign a guardrail once, and every current and future team is governed automatically. The alternative — configuring each subscription by hand — guarantees drift and toil.

The subscription is the blast-radius and billing boundary. A subscription is the unit of scale, the limit of a misconfiguration’s reach, and the line Finance bills along. You split subscriptions by responsibility, not convenience: shared platform services (connectivity, management, identity) in their own subscriptions owned by the platform team; each application workload-or-environment in its own subscription handed to an app team. One per workload keeps blast radius small and the bill clean.

Connectivity is centralized, workloads are disposable. In hub-spoke, shared network services (firewall, DNS, Bastion, hybrid gateways) live once in a hub VNet; each workload gets a spoke VNet peered to the hub. A route table (UDR) forces spoke egress through the hub firewall so all traffic is inspected and logged in one place. Spokes stay small and replaceable; the security boundary between teams is real.

Governance is preventive, not a quarterly audit. Azure Policy evaluates resources at create/update time and can deny non-compliant ones, audit them, or DeployIfNotExists (DINE) a missing configuration (like Log Analytics onboarding). Assigned at an MG, policy makes compliance a property of the platform rather than something a team must remember. You always dry-run a new policy (DoNotEnforce) and read the compliance results before flipping enforcement on.

Least privilege is granted high, to groups, just-in-time. Entra ID is the control plane. You grant Azure RBAC roles to groups, never individuals, scoped at the MG or subscription level rather than per-resource, so membership — not a hunt through assignments — controls access. Privileged roles (Owner, User Access Administrator) are made eligible, not active, through PIM, so engineers activate them just-in-time with MFA and approval, leaving no standing admin.

The vocabulary in one table

Before the deep sections, pin down every moving part. The glossary at the end repeats these for lookup; this table is the mental model side by side:

Concept One-line definition Where it lives Why it matters to the landing zone
Management group Container above subscriptions Tenant hierarchy Policy + RBAC inherit down
Subscription Unit of scale / billing / blast radius Under an MG The boundary you split on
Resource group Lifecycle container for resources In a subscription Deploy/delete as a unit
Hub VNet Shared network services Connectivity subscription One place to inspect + connect
Spoke VNet A workload’s network App subscription Peered, small, disposable
UDR (route table) Overrides default routing Attached to a subnet Forces egress via firewall
Azure Policy initiative Grouped policy definitions Assigned at a scope Enforces/audits at scale
DINE Auto-deploys missing config Policy effect Onboards monitoring automatically
RBAC role assignment Principal + role + scope At MG/sub/RG/resource Who can do what, where
PIM Just-in-time privileged roles Entra ID P2 No standing admin
Log Analytics workspace Central telemetry store Management subscription Cross-estate queries
Budget Spend cap with alerts Per subscription/RG Warns before overrun

The brief

Our fictional company is Northwind Freight, a mid-size logistics firm moving from a single hand-built subscription (one engineer clicked it together two years ago, nobody remembers what is in it) to a governed Azure foundation. Leadership wants three things, in their words:

  1. “Stop the wild west.” Every resource must be tagged, owned, and monitored. No more orphaned public IPs and no mystery spend.
  2. “Let app teams move fast — safely.” A new project team should get a ready-to-use, isolated environment with guardrails already on, without filing a networking ticket.
  3. “Show me the bill, by team.” Finance needs cost broken down per workload and per environment, with alerts before budgets blow.

Translated into platform language, Northwind needs: a management-group hierarchy for inherited policy and RBAC; separate subscriptions for shared platform services versus application workloads; a hub-spoke network so connectivity is centralized; an identity baseline of least-privilege RBAC granted to groups; policy guardrails that enforce tagging and block risky resources; a monitoring baseline funnelling logs to one place; and cost controls with budgets and alerts. That is exactly an Azure landing zone — and exactly the eight design areas of the Cloud Adoption Framework.

Here is each leadership ask mapped to the design area that satisfies it, the concrete mechanism, and the acceptance signal that proves it is done:

Leadership ask CAF area Mechanism Acceptance signal
“Stop the wild west” (tags) Governance + Cost Require-tags deny policy at MG Untagged resource is blocked
“Stop the wild west” (no public IPs) Networking + Governance Deny-public-IP policy on corp branch Public IP on a corp NIC is denied
“Stop the wild west” (monitored) Monitoring DINE onboarding to central LAW New resource auto-sends diagnostics
“Move fast, safely” (isolated env) Resource org + Networking One sub per workload + peered spoke New sub inherits guardrails, peers to hub
“Move fast, safely” (no tickets) Platform automation Spokes shipped via PR into own sub App team deploys without platform ticket
“Show me the bill, by team” (split) Cost Subscription = billing boundary Cost Management groups by subscription
“Show me the bill, by team” (attribute) Cost + Governance costCenter/env tags enforced Spend slices by tag
“Show me the bill, by team” (warn) Cost Budget alerts at 80% / 100% Owner notified before overrun

Design decisions

A landing zone is mostly a set of decisions. Implementation is the easy part once the decisions are explicit and defensible. Here are the seven that matter, with the reasoning a reviewer will expect — and the course lesson that owns each in depth. First, the whole decision set as one table you can take into a review:

# Decision Northwind choice Chief alternative Why the choice wins
1 Hierarchy CAF MG tree Flat tenant Inheritance to future subs
2 Sub split Platform vs app One giant sub Blast radius + clean bill
3 Network Hub-spoke Flat / full-mesh Central inspection, scales
4 Identity Group RBAC + PIM Per-user Owner Least privilege, auditable
5 Governance Policy at MG scope Per-sub config Enforced once, inherits
6 Monitoring One central LAW Per-team workspaces Cross-estate hunting
7 Cost Per-sub budgets + tags Single bill Per-team attribution

1. Management-group hierarchy

Decision: adopt the CAF reference hierarchy rather than a flat tenant. Management groups let you assign policy and RBAC once and inherit everywhere beneath them, including subscriptions that do not exist yet.

Tenant Root Group
└── northwind                    (top-level MG — company guardrails)
    ├── platform                 (shared services)
    │   ├── identity             (Entra Connect, domain services)
    │   ├── management           (Log Analytics, automation, backup)
    │   └── connectivity         (hub VNet, firewall, DNS)
    ├── landingzones             (application workloads)
    │   ├── corp                 (internal — no public ingress)
    │   └── online               (internet-facing)
    ├── sandbox                  (experiments — loose policy)
    └── decommissioned           (quarantine before deletion)

A policy assigned at landingzones (for example, “deny public IP on a NIC”) flows to corp, online, and every future subscription under them. New teams inherit guardrails automatically. Detail: Azure landing zone — resource organization.

Each management group in the reference tree has a job. Here is what lives where, what is assigned at each node, and why it exists as its own scope:

Management group Purpose Typical policy assigned here Typical RBAC here
northwind (top) Company-wide guardrails Require-tags; allowed-locations; audit baseline Platform team Reader (broad visibility)
platform Shared-service guardrails Stricter diagnostic + security baseline Platform team Contributor
platform/connectivity Network services Deny non-approved network resource types Network admins Contributor
platform/management Telemetry + automation DINE Log Analytics onboarding Ops team Contributor
platform/identity Identity services Identity-specific compliance Identity admins Contributor
landingzones App-workload guardrails Deny-public-IP; enforce HTTPS; DINE Defender (none broad — set per app sub)
landingzones/corp Internal workloads Deny all public ingress App team Contributor on their sub
landingzones/online Internet-facing Require WAF / Front Door fronting App team Contributor on their sub
sandbox Experiments Loose: audit-only, spend cap Developers Contributor, time-boxed
decommissioned Pre-deletion quarantine Deny new resource creation Platform team only

A subtle but exam-worthy point: the order and scope of assignment matter. The narrower the scope, the more specific the rule should be. Here is how to reason about where to place an assignment:

Place the assignment at… When the rule is… Example Trade-off
Tenant Root Truly universal, rarely (usually left empty) Hard to change; affects everything
Top MG (northwind) Company-wide intent Require-tags, allowed-locations Broad blast radius if wrong
Mid MG (landingzones) Applies to all workloads Deny-public-IP Inherits to corp + online
Leaf MG (corp) Branch-specific Deny all public ingress Doesn’t affect online
Subscription One team’s exception A workload-specific waiver Doesn’t inherit; per-sub toil

2. Platform vs application subscriptions

Decision: the subscription is the unit of scale and the blast-radius / billing boundary — so split by responsibility, not convenience. Platform subscriptions (connectivity, management, identity) are owned by the platform team and rarely change. Application (landing-zone) subscriptions are handed one-per-workload-or-environment to app teams.

Subscription Lives under Owned by Purpose
sub-connectivity platform/connectivity Platform Hub VNet, Firewall, DNS, gateways
sub-management platform/management Platform Log Analytics, automation, backup vault
sub-identity platform/identity Platform Entra Connect, domain services
sub-corp-prod landingzones/corp App team Internal production workloads
sub-online-prod landingzones/online App team Internet-facing production workloads

This gives Finance a clean per-team bill (subscription = cost boundary) and limits blast radius: a misconfiguration in one app subscription cannot touch another. Detail: Azure landing zone — resource organization.

Why not just use resource groups to separate teams inside one subscription? Because the subscription is the boundary for several things a resource group is not. The comparison that settles the argument:

Boundary property Resource group Subscription Implication
Billing / cost rollup Tag-based only First-class boundary Sub = clean per-team bill
RBAC inheritance root Yes (narrow) Yes (broad) Sub-level Contributor scopes a team neatly
Policy assignment scope Yes Yes (+ inherits from MG) Sub inherits MG guardrails automatically
Many Azure quotas/limits Shared with sub Per subscription One team can’t exhaust another’s quota
Blast radius of Owner RG only Whole subscription App team Owner can’t reach platform
Move between MGs No Yes Reorganize governance without rebuild

And subscriptions are not free of limits — knowing the real ceilings keeps your design honest. Representative subscription-scope limits to design against (treat as “design well below,” not hard targets to chase):

Limit Approximate ceiling Why it shapes design
Resource groups per subscription ~980 Plenty; not a real constraint
Role assignments per subscription ~2,000 Favors group-based RBAC over per-user
VNets per subscription (default) ~1,000 (raisable) Spoke-per-workload scales fine
Subscriptions per management group Large MG tree, not flat, is the limiter on governance
Public IPs per subscription (default) ~10 standard (raisable) Deny-public-IP keeps this near zero in corp
Azure Policy assignments per scope ~200 Group definitions into initiatives

3. Hub-spoke networking

Decision: centralize shared network services in a hub VNet (firewall, DNS, Bastion, VPN/ExpressRoute gateway) and give each workload a spoke VNet peered to the hub. Force spoke egress through the hub firewall with a route table (UDR). This means one place to inspect and log traffic, one place to attach hybrid connectivity, and spokes that stay small and disposable.

The alternative — a flat VNet shared by everyone, or full-mesh peering between workloads — does not scale and erases the security boundary between teams. Detail: Azure landing zone — network topology. The three topologies compared, so the choice is defensible:

Topology How it connects Pros Cons Verdict
Flat shared VNet One VNet, all teams Simplest No isolation; noisy-neighbor; doesn’t scale Avoid past a pilot
Full-mesh peering Every VNet peers every VNet Direct paths O(n²) peerings; no central inspection Unmanageable at scale
Hub-spoke Spokes peer only the hub Central inspection, hybrid, scales One extra hop; hub is a focal point The standard
Virtual WAN Microsoft-managed hub Managed routing, global Cost; less control Large/global estates (Virtual WAN)

The hub carries a fixed set of shared services, each in a subnet with a mandatory or conventional name. Get these exact, because Azure validates several of them:

Hub component Subnet name (exact) Typical CIDR Job
Azure Firewall AzureFirewallSubnet 10.10.1.0/26 Inspect + log all egress
Firewall mgmt (forced tunnel) AzureFirewallManagementSubnet 10.10.2.0/26 Firewall management plane
Bastion AzureBastionSubnet 10.10.3.0/26 Browser RDP/SSH, no public NIC
VPN/ER gateway GatewaySubnet 10.10.4.0/27 Hybrid connectivity
Private DNS resolver inbound <custom> (delegated) 10.10.5.0/28 Hybrid DNS resolution
Shared workload snet-shared 10.10.6.0/24 Jump hosts, shared tooling

The CIDR plan must not overlap, because peered VNets with overlapping ranges cannot route. A clean, non-overlapping allocation for Northwind:

Network CIDR Subnets Notes
Hub 10.10.0.0/16 firewall, bastion, gateway, dns Platform-owned
Corp spoke 10.20.0.0/16 snet-workload, snet-data, snet-pe No public ingress
Online spoke 10.30.0.0/16 snet-web, snet-appgw, snet-pe AppGW + WAF in front
Reserved (future) 10.40.0.0/16 Next workload
On-prem (hybrid) 172.16.0.0/16 Advertised via gateway

Peering has options that change cost and reachability — set them deliberately, not by accepting defaults:

Peering setting Hub→spoke value Spoke→hub value Why
allowVirtualNetworkAccess true true Permit traffic across the peering
allowForwardedTraffic true true Let firewall-forwarded traffic transit
allowGatewayTransit true false Hub shares its gateway
useRemoteGateways false true Spoke uses the hub’s gateway
Result if mismatched Asymmetric/blocked routing; “Initiated” state

4. Identity baseline

Decision: Microsoft Entra ID is the control plane. Grant Azure RBAC roles to groups, never individuals, and scope them at the management-group or subscription level rather than per-resource. Privileged roles (Owner, User Access Administrator) are made eligible, not active, through Privileged Identity Management (PIM) so engineers activate them just-in-time with MFA and approval.

Least privilege is the rule: app teams get Contributor on their own subscription and nothing above it; the platform pipeline identity gets Owner only at the management group it manages. This builds directly on the Entra ID fundamentals: tenants, users, groups, RBAC lesson and goes deeper in Azure landing zone — identity & access and Entra RBAC governance.

Who gets which role at which scope — the RBAC plan a reviewer will check line by line:

Principal (group) Role Scope Standing or PIM
grp-platform-admins Owner platform MG PIM-eligible
grp-platform-engineers Contributor platform MG Standing
grp-network-admins Network Contributor sub-connectivity Standing
grp-ops Log Analytics Contributor sub-management Standing
grp-corp-app-team Contributor sub-corp-prod Standing
grp-online-app-team Contributor sub-online-prod Standing
grp-security Security Reader northwind MG Standing
grp-billing Cost Management Reader northwind MG Standing
Any human Owner / UAA any PIM-only, JIT

The built-in roles you actually use here, what they grant, and the trap each one carries:

Role Grants Use for Trap
Owner Full access + manage access Almost never standing Can grant itself anything
Contributor Full manage, not RBAC App teams on their sub Cannot assign roles (by design)
Reader View only Auditors, security Read can still see secrets’ existence
User Access Administrator Manage RBAC only Break-glass via PIM Privilege-escalation vector
Network Contributor Manage network resources Network admins Scope tightly to connectivity
Log Analytics Contributor Manage workspaces + data Ops Can read all ingested logs
Key Vault Secrets User Read secret values Workload identities Grant per-vault, not broad

PIM turns standing privilege into just-in-time. The settings that make it real:

PIM control Recommended setting Why
Activation requires MFA On Proves the human, not a stolen token
Activation requires justification On Audit trail of why
Activation requires approval On for Owner/UAA Two-person control on top privilege
Maximum activation duration 1–4 hours Privilege expires automatically
Eligible vs active Eligible by default No standing admin
Access reviews Quarterly Catch stale eligibility

Deeper still, privileged-role elevation for resources is its own discipline — see PIM for Azure resources: JIT elevation.

5. Policy guardrails

Decision: governance is preventive, not a quarterly audit. Assign Azure Policy initiatives at the management-group scope so they inherit. The three Northwind needs first:

Always dry-run a new initiative in DoNotEnforce mode first, read the compliance results, then flip enforcement on. Detail: Azure landing zone — governance and, for shipping policy through CI/CD, Azure Policy as code and Azure Policy & governance at scale.

The policy effects are the heart of governance. Each behaves differently at evaluation time — know exactly what each does and when to reach for it:

Effect What it does Needs identity? Blocks deploy? Use for
Deny Rejects non-compliant create/update No Yes Hard guardrails (no public IP)
Audit Flags non-compliance, allows it No No Visibility before enforcing
Append Adds fields to a resource No No Force a tag value, add a setting
Modify Adds/updates/removes properties Yes No Remediate tags at scale
DeployIfNotExists Deploys a related resource if missing Yes No Onboard LAW/Defender
AuditIfNotExists Audits if a related resource is missing No No “Is diagnostics configured?”
Disabled Turns the policy off No No Temporarily park a rule
DenyAction Blocks a specific action (e.g. delete) No Yes (action) Protect against deletion

The three Northwind guardrails in detail — definition, scope, parameters, and the failure each prevents:

Guardrail Built-in definition (intent) Scope Key parameter Prevents
Require costCenter tag “Require a tag on resources” northwind MG tagName=costCenter Untagged, unattributable spend
Require owner + env Same definition, ×2 assignments northwind MG tagName=owner / env Orphaned, unenvironment’d resources
Deny public IP on NIC “Network interfaces should not have public IPs” landingzones/corp (none) Internet-exposed internal VMs
Allowed locations “Allowed locations” northwind MG region allow-list Data landing in wrong geography
DINE Log Analytics “Configure … to send logs to LAW” landingzones workspace ID Monitoring drift
DINE Defender plans “Configure Defender plan” northwind MG plan + tier Security-coverage gaps

Enforcement mode is the safety valve. The two modes and how to use the rollout:

enforcementMode Behavior When to use
DoNotEnforce (Disabled) Evaluates compliance, does not block or remediate Always first — read what would be denied
Default (Enabled) Fully enforces (deny blocks, DINE remediates) After you’ve reviewed DoNotEnforce results

6. Monitoring baseline

Decision: one central Log Analytics workspace in the management subscription. Every subscription’s diagnostic settings, Defender for Cloud, and Activity Logs funnel into it. Centralizing means security can query across the whole estate, and DINE policy can enforce onboarding automatically. Detail: Azure landing zone — governance and the Azure Monitor deep dive.

What flows into the central workspace, from where, and the mechanism that puts it there:

Telemetry Source Mechanism Why central
Resource diagnostics (metrics/logs) Every resource Diagnostic settings (DINE-enforced) Query all resources together
Activity log Each subscription Diagnostic setting at sub scope “Who changed what” across estate
Defender for Cloud alerts All subscriptions Defender export to LAW Single security pane
Entra sign-in / audit logs Tenant Entra diagnostic settings Correlate identity with resource events
VM guest logs/perf VMs Azure Monitor Agent + DCR Host-level visibility
Network flow logs NSGs / firewall Flow logs → LAW Traffic forensics

A workspace is not free or infinitely retained — the knobs that drive both behavior and bill:

Workspace setting Default Options Drives
Pricing tier Pay-as-you-go (PerGB2018) Commitment tiers (100GB/day…) Per-GB cost at volume
Retention 30 days 30–730 days (then Archive) Storage cost + query window
Data collection rule (DCR) none Scope what each resource sends Volume + noise
Table-level retention inherits workspace Per-table override Keep security logs longer, cheaply
Daily cap none Cap GB/day Runaway-ingestion insurance
Access mode resource-context workspace-context Who can read which logs

7. Cost controls

Decision: a budget with alerts per subscription, plus mandatory tags so Cost Management can slice spend by costCenter and env. Alerts fire at 80% and 100% of budget to the subscription owner before the month closes. This answers Northwind’s “show me the bill, by team” directly. For the full discipline, see Azure FinOps & cost engineering and the reservations & savings-plan strategy.

The cost-control mechanisms, what each does, and when it fires:

Mechanism What it does Granularity Action
Budget (Cost) Tracks actual spend vs amount Sub / RG / tag Alert at thresholds
Budget (forecast) Projects month-end spend Sub / RG Alert before overrun
Cost allocation tags Slice spend by team/env Resource Reporting, chargeback
Cost Management views Group/filter spend Any dimension Analysis, anomaly spotting
Action group on budget Email/webhook/automation Per budget Notify owner; trigger runbook
Anomaly alerts Detect unusual spend Subscription Catch surprises early

The tag taxonomy is load-bearing — these are the tags every resource must carry, why, and what enforces them:

Tag Example value Purpose Enforced by
costCenter logistics Charge spend to a budget Require-tags deny policy
owner app-team Who to call; cleanup target Require-tags deny policy
env prod / dev Separate prod vs non-prod spend Require-tags deny policy
workload checkout-api Per-app rollup Convention (audit policy)
dataClass confidential Drive security/retention Convention (audit policy)
expiry 2026-12-31 Auto-cleanup of sandbox Sandbox automation

Architecture at a glance

The diagram traces the landing zone as governance and traffic actually flow through it, left to right. Start at the governance plane: the Entra tenant (with PIM-eligible privileged groups) and the management-group hierarchy carry a single policy initiative — require-tags, deny-public-IP, DINE — that inherits downward. That inheritance lands on the subscriptions zone, where the platform subscriptions (connectivity, management, identity) are separated from the application subscriptions (corp-prod, online-prod); the split is the blast-radius and billing boundary. The platform’s connectivity subscription owns the connectivity hub — Azure Firewall enforcing forced-tunnel egress via a 0.0.0.0/0 UDR, Private DNS with linked privatelink zones, and Bastion for public-NIC-free RDP/SSH. Each workload spoke (corp 10.20/16, online 10.30/16 behind App Gateway + WAF) peers to the hub, and the key arrow loops back: spoke egress returns through the firewall before leaving. Finally everything — every spoke’s diagnostics and every subscription’s Activity log — reports into the observe & cost zone: the central Log Analytics workspace, Defender plans onboarded by DINE, and per-subscription budgets keyed on the costCenter tag.

Read the five numbered badges as the control points where this most often goes wrong, and the legend narrates each as symptom · confirm · fix: a deny policy too broad (1) blocking legitimate deploys across every child sub; the wrong subscription split (2) collapsing the blast-radius boundary; egress not forced through the hub (3) when a spoke has no UDR; one-sided peering (4) that never reaches Connected; and monitoring drift (5) when DINE has no remediation. The whole method of operating a landing zone is in that left-to-right path plus those five checks — inheritance flows down, traffic is centralized and inspected, telemetry converges, and each numbered hop is a thing you can confirm with one az command.

Northwind Freight Azure landing-zone architecture: a governance plane (Entra tenant with PIM, a management-group hierarchy, and a require-tags/deny-public-IP/DINE policy initiative) inheriting downward into a subscriptions zone that splits platform subscriptions (connectivity, management, identity) from application subscriptions (corp-prod, online-prod); a connectivity hub VNet 10.10.0.0/16 containing Azure Firewall with forced-tunnel UDR egress, Private DNS with linked privatelink zones, and Bastion; workload spokes (corp 10.20.0.0/16 with no ingress, online 10.30.0.0/16 behind Application Gateway and WAF) peered to the hub with egress looping back through the firewall; and an observe-and-cost zone in the management subscription with a central Log Analytics workspace, Defender plans onboarded via DINE, and per-subscription budgets keyed on the costCenter tag — with five numbered badges marking a too-broad deny policy, a wrong subscription split, egress not forced through the hub, one-sided VNet peering, and monitoring drift

Staged build plan

You do not build a landing zone in one giant deployment — you build it in stages, validating each before the next. Here is the plan; each stage names the deeper lesson to open if you need more than the snippet. The hands-on lab that follows builds a free-tier slice of stages 1, 3, 4, and 5 end to end.

Stage What you build Reuse lesson
0. Foundations Account, Cloud Shell, CLI context Earlier course lessons + CAF blueprint
1. Resource organization Management groups + subscription layout Resource organization
2. Identity RBAC to groups, PIM for privileged roles Identity & access
3. Networking Hub VNet, firewall subnet, spoke peering, UDR Network topology
4. Governance Required-tags + deny-public-IP + DINE policy Governance
5. Monitoring Central Log Analytics + diagnostic settings Governance
6. Security Defender for Cloud plans + Zero Trust posture Security baseline
7. Cost Budgets + alerts per subscription FinOps & cost engineering
8. Automation Wrap it all in IaC + a pipeline Policy as code

Each stage has a definition of done and the one command that proves it — a checklist you can literally tick:

Stage Definition of done Proof command (shape)
1 MG exists; child inherits a parent policy az account management-group show
2 RBAC granted to a group at sub scope az role assignment list --assignee <group>
3 Hub+spoke peered Connected both ways az network vnet peering list --query "[].peeringState"
4 A deny policy actually blocks a bad resource attempt create → expect RequestDisallowedByPolicy
5 Central LAW exists; a resource sends diagnostics az monitor diagnostic-settings list
6 Defender plan enabled on the subscription az security pricing show -n VirtualMachines
7 Budget with alert configured az consumption budget list
8 The above deploys from a pipeline, PR-gated pipeline run is green

Representative IaC for the core pieces

You will use a mix in real life: Bicep for Azure-native resources and tenant-scoped objects (management groups, policy), Terraform when you want one tool across clouds, and az for glue and verification. The trade-off, so you pick deliberately:

Tool Best at Scope strength Weak at Use in landing zone for
Bicep Azure-native, tenant/MG scope First-class MG, sub, policy Multi-cloud MGs, policy, platform resources
Terraform Multi-cloud, large modules Mature state, modules Newest Azure features lag Cross-cloud orgs; via AVM
az CLI Glue, one-offs, verification Imperative, scriptable Not declarative/idempotent Bootstrap, validation, teardown
ARM JSON Underlying engine What Bicep compiles to Verbose by hand (rarely authored directly now)

Here are representative snippets for each core piece.

Management group (Bicep, tenant scope):

targetScope = 'tenant'

resource northwind 'Microsoft.Management/managementGroups@2023-04-01' = {
  name: 'northwind'
  properties: { displayName: 'Northwind Freight' }
}

resource landingzones 'Microsoft.Management/managementGroups@2023-04-01' = {
  name: 'landingzones'
  properties: {
    displayName: 'Landing Zones'
    details: { parent: { id: northwind.id } }
  }
}

Hub VNet + firewall subnet (Terraform):

resource "azurerm_virtual_network" "hub" {
  name                = "vnet-hub-eus"
  resource_group_name = azurerm_resource_group.connectivity.name
  location            = "eastus"
  address_space       = ["10.10.0.0/16"]
}

resource "azurerm_subnet" "firewall" {
  name                 = "AzureFirewallSubnet" # exact name is mandatory
  resource_group_name  = azurerm_resource_group.connectivity.name
  virtual_network_name = azurerm_virtual_network.hub.name
  address_prefixes     = ["10.10.1.0/26"]
}

Required-tags policy assignment (Bicep, management-group scope):

targetScope = 'managementGroup'

resource requireCostCenter 'Microsoft.Authorization/policyAssignments@2024-04-01' = {
  name: 'require-tag-costcenter'
  properties: {
    displayName: 'Require costCenter tag on resources'
    // built-in: "Require a tag on resources"
    policyDefinitionId: tenantResourceId(
      'Microsoft.Authorization/policyDefinitions',
      '871b6d14-10aa-478d-b590-94f262ecfa99')
    parameters: { tagName: { value: 'costCenter' } }
    enforcementMode: 'Default'
  }
}

Forced-tunnel route table (Bicep, push spoke egress to the firewall):

resource udr 'Microsoft.Network/routeTables@2023-11-01' = {
  name: 'rt-spoke-forcedtunnel'
  location: location
  properties: {
    routes: [ {
      name: 'default-to-firewall'
      properties: {
        addressPrefix: '0.0.0.0/0'
        nextHopType: 'VirtualAppliance'
        nextHopIpAddress: '10.10.1.4' // Azure Firewall private IP
      }
    } ]
  }
}

Log Analytics workspace (az):

az monitor log-analytics workspace create \
  --resource-group rg-management \
  --workspace-name law-northwind-central \
  --location eastus \
  --sku PerGB2018 \
  --retention-time 30

Real-world scenario

Northwind Freight kicked off the landing-zone build with the four-engineer platform team and a hard deadline: the first application team — the online checkout workload — was promised an isolated, governed subscription in three weeks. The legacy estate was a single subscription, sub-legacy-allinone, holding 140 resources: untagged VMs, three orphaned public IPs, a SQL database reachable from the internet, and a monthly bill of about ₹9.2 lakh that Finance could not split by team. The CTO’s instruction was the brief verbatim: stop the wild west, let app teams move fast safely, and show the bill by team.

Week one was design and the governance plane. The team stood up the MG hierarchy (northwind → platform / landingzones / sandbox / decommissioned) and assigned three policies at northwind in DoNotEnforce first — require costCenter, require owner, allowed-locations (India regions only). The dry run immediately paid off: the compliance view showed 111 of 140 legacy resources non-compliant on tags. Had they enforced deny on day one, the legacy team’s own redeploys would have been blocked mid-flight. Instead they ran a Modify remediation to backfill costCenter from a spreadsheet, re-checked compliance, then flipped require-tags to Default. The deny-public-IP policy went onto the landingzones/corp branch only — deliberately not on online, which legitimately needed a WAF-fronted public entry point.

Week two was connectivity and the first real failure. The team built the hub (10.10.0.0/16) with AzureFirewallSubnet, Bastion, and a gateway subnet, then peered the new corp spoke (10.20.0.0/16). The corp test VM could not reach the internet at all — every outbound call timed out. The reflex was to blame the firewall rules, and an hour vanished there. The actual cause was the badge-3 failure on the diagram: they had attached the UDR forcing 0.0.0.0/0 to the firewall, but had not yet added a firewall network rule allowing the traffic, and a separate test against a second spoke revealed badge-4 — the online spoke’s peering read Initiated, not Connected, because it had been created on only one side. az network vnet peering list --query "[].peeringState" made both obvious in seconds once they stopped guessing and ran the confirm command.

Week three delivered the online team’s subscription. Because the guardrails lived at the MG, the new sub-online-prod inherited require-tags, allowed-locations, and DINE Log Analytics onboarding the moment it was created and moved under landingzones/online — zero extra configuration. The app team got Contributor on their subscription and nothing above it, deployed their spoke and App Gateway via a PR into their own repo, and were serving traffic in two days without filing a single networking ticket. DINE auto-onboarded every resource they created to the central law-northwind-central workspace, so security had estate-wide visibility from minute one. A per-subscription budget of ₹1.5 lakh with alerts at 80% and 100% gave Finance the per-team line they had asked for.

The outcome, after one quarter: 100% tag compliance on all new resources, zero public IPs in the corp branch (the policy blocked three attempts during the migration — each a VM someone tried to give a public IP “just to test”), and a Cost Management view that finally sliced the bill by costCenter. The legacy subscription was drained workload-by-workload into governed app subscriptions and moved to decommissioned. The lesson the team wrote on the wall: “Inheritance and dry-run are the whole game. Assign the guardrail once at the management group, prove it in DoNotEnforce, then enforce — and when the network breaks, run the confirm command before you touch a rule.”

The build as a timeline, because the order of moves is the lesson:

Week Goal Key action Failure hit Resolution
1 Governance plane Policies at MG in DoNotEnforce 111/140 legacy untagged Modify-remediate, then enforce
1 Tag attribution Require costCenter/owner Would have blocked legacy redeploys Dry-run caught it first
2 Connectivity Hub + corp spoke + UDR Corp VM no internet (badge 3) Add firewall rule for the route
2 Peering Online spoke peering State Initiated (badge 4) Create peering both directions
3 App onboarding New sub-online-prod (none — inherited cleanly) Guardrails applied automatically
3 Cost Per-sub budget + alerts Finance gets per-team bill
+1 qtr Decommission legacy Drain to governed subs Move legacy to decommissioned

Advantages and disadvantages

The governed-landing-zone model both prevents an entire class of production problems and adds real upfront complexity. Weigh it honestly:

Advantages (why this model helps you) Disadvantages (why it costs you)
Guardrails inherit — assign once at an MG, every current and future sub is governed Upfront design + IaC effort before the first workload ships any value
Subscriptions are clean blast-radius and billing boundaries — a misconfig stays contained More subscriptions = more boundaries to manage (mitigated by automation)
App teams get isolated, pre-governed environments and move fast without tickets Platform team becomes a dependency; needs to scale with the org
Policy makes compliance a property of the platform, not a memory test A too-broad deny can block legitimate deploys across many subs at once
Central monitoring lets security hunt across the whole estate from one workspace Centralized telemetry can get expensive at volume without DCR scoping
Hub-spoke gives one place to inspect, log, and attach hybrid connectivity The hub is a focal point — an extra hop and a thing that must stay up
Everything as code is reproducible, reviewable, and auditable Steeper skill bar; the team must know Bicep/Terraform and pipelines

The model is right for any organization past a handful of subscriptions, anyone with compliance obligations, and any platform team that wants to onboard app teams repeatably. It is overkill for a single hobby subscription or a one-week pilot. The disadvantages are all manageable — they are the reason platform automation (subscription vending) and the Well-Architected Framework exist — but only if you know they exist, which is the point of building the design deliberately rather than letting it accrete.

Hands-on lab — build a free-tier landing-zone slice

You will build a real, working slice of the landing zone using the az CLI in Azure Cloud Shell — no installs. We keep it inside a single subscription so it stays free-tier-friendly (a management group, resource groups, two VNets with peering, a tagging policy, and a Log Analytics workspace cost nothing or pennies). Everything goes into resource groups you delete at the end.

Note on scope: creating real platform and application subscriptions needs an enrollment you may not have on a personal account, so the lab models the hierarchy with a management group and models platform-vs-app separation with resource groups + tags. The commands are identical in shape to the real thing.

1. Set context. Open Azure Cloud Shell, pick Bash, and confirm where you are:

az account show --output table
SUB_ID=$(az account show --query id -o tsv)
echo "Working in subscription: $SUB_ID"

2. Create a management group (stage 1). This needs no enrollment and is free:

az account management-group create \
  --name northwind-demo \
  --display-name "Northwind Freight (demo)"

Expected: JSON describing the new group. (It can take a minute to appear in the portal — that is normal.)

3. Create platform and application resource groups (modelling the subscription split), each tagged for cost attribution:

az group create -n rg-connectivity -l eastus \
  --tags costCenter=platform owner=platform-team env=shared

az group create -n rg-management -l eastus \
  --tags costCenter=platform owner=platform-team env=shared

az group create -n rg-corp-prod -l eastus \
  --tags costCenter=logistics owner=app-team env=prod

4. Build the hub VNet and a spoke, then peer them (stage 3):

# Hub network in the connectivity RG
az network vnet create -g rg-connectivity -n vnet-hub \
  --address-prefix 10.10.0.0/16 \
  --subnet-name AzureFirewallSubnet --subnet-prefix 10.10.1.0/26

# Spoke network in the corp app RG
az network vnet create -g rg-corp-prod -n vnet-corp-spoke \
  --address-prefix 10.20.0.0/16 \
  --subnet-name snet-workload --subnet-prefix 10.20.1.0/24

# Get resource IDs for peering
HUB_ID=$(az network vnet show -g rg-connectivity -n vnet-hub --query id -o tsv)
SPOKE_ID=$(az network vnet show -g rg-corp-prod -n vnet-corp-spoke --query id -o tsv)

# Peer both directions
az network vnet peering create -g rg-corp-prod -n spoke-to-hub \
  --vnet-name vnet-corp-spoke --remote-vnet "$HUB_ID" \
  --allow-vnet-access

az network vnet peering create -g rg-connectivity -n hub-to-spoke \
  --vnet-name vnet-hub --remote-vnet "$SPOKE_ID" \
  --allow-vnet-access

5. Assign a tagging guardrail (stage 4). Assign the built-in Require a tag on resources policy, scoped to the corp resource group, requiring costCenter:

az policy assignment create \
  --name require-costcenter \
  --display-name "Require costCenter tag" \
  --scope "/subscriptions/$SUB_ID/resourceGroups/rg-corp-prod" \
  --policy "871b6d14-10aa-478d-b590-94f262ecfa99" \
  --params '{ "tagName": { "value": "costCenter" } }'

6. Create the central Log Analytics workspace (stage 5):

az monitor log-analytics workspace create \
  --resource-group rg-management \
  --workspace-name law-northwind-demo \
  --location eastus \
  --sku PerGB2018 --retention-time 30

7. Validate. Prove the slice exists and is wired correctly:

# Management group present
az account management-group show --name northwind-demo -o table

# Peering shows "Connected" both ways
az network vnet peering list -g rg-corp-prod --vnet-name vnet-corp-spoke \
  --query "[].{name:name, state:peeringState}" -o table

# Policy assignment present at the corp RG
az policy assignment list \
  --scope "/subscriptions/$SUB_ID/resourceGroups/rg-corp-prod" \
  --query "[].displayName" -o table

# Workspace provisioned
az monitor log-analytics workspace show \
  -g rg-management -n law-northwind-demo \
  --query provisioningState -o tsv

Expected: the peering state reads Connected for spoke-to-hub, the policy displayName appears, and the workspace provisioningState is Succeeded. You now have, in miniature, every pillar of the landing zone: hierarchy, platform/app separation, hub-spoke, a guardrail, and central monitoring.

8. Cleanup. Delete the resource groups (this removes the VNets, peering, policy assignment scoped to the RG, and workspace), then the management group:

az group delete -n rg-corp-prod --yes --no-wait
az group delete -n rg-connectivity --yes --no-wait
az group delete -n rg-management --yes --no-wait
az account management-group delete --name northwind-demo

Cost note: Empty VNets, peering, a tagging policy, a management group, and a workspace with no ingested data are free or a few pennies; deleting the resource groups the same day keeps this comfortably in free-tier territory.

The lab steps mapped to what each proves and its real-world analogue:

Step What you did What it proves Real-world analogue
2 Create a management group Hierarchy is free and enrollment-light The MG tree that inherits guardrails
3 Tagged platform/app RGs Cost attribution starts with tags Platform-vs-app subscription split
4 Hub + spoke + bidirectional peering Peering needs both sides Hub-spoke connectivity
5 Require-tags policy at RG scope Governance is an assignment Deny guardrails at MG scope
6 Central Log Analytics workspace One place for telemetry The estate-wide monitoring sink
7 Validate everything Confirm commands beat guessing The acceptance test
8 Delete the resource groups Clean teardown is part of IaC Decommission discipline

Common mistakes & troubleshooting

This is the playbook — the part you bookmark. These are the failures that actually bite a platform team building a landing zone. First as a scannable table you can read mid-incident, then the expanded reasoning for the ones that hurt most.

# Symptom Root cause Confirm (exact cmd / portal path) Fix
1 New MG not visible after create Propagation delay; or you lack Management Group Contributor at root az account management-group list -o table; check role at root Wait ~1 min; ensure tenant-level MG permission is enabled (first MG op)
2 A deny policy blocks a resource you expected to succeed Deny broader than intended (e.g. deny-public-IP catching an LB) Read policyEvaluationDetails in the 403; az policy state list --filter "complianceState eq 'NonCompliant'" Narrow with notIn/excluded scopes; re-test in DoNotEnforce
3 VNet peering stuck Initiated Peering created on only one side az network vnet peering list --query "[].peeringState" Create the peering in both directions
4 Spoke VM has no internet after adding the firewall UDR sends 0.0.0.0/0 to firewall, but no firewall rule allows it az network nic show-effective-route-table; firewall logs show deny Add a firewall network/application rule (or remove UDR while testing)
5 DINE policy never onboards new resources No remediation task; remediation identity lacks the role at scope az policy remediation list; check the assignment’s MI role az policy remediation create; grant the MI its role
6 az policy assignment create fails on --params JSON quoting mangled in the shell Re-run with the JSON in a file Pass --params @file.json; or use Cloud Shell
7 New subscription not governed Sub not moved under the governed MG az account management-group subscription show Move the sub under the correct MG; policy inherits
8 RBAC change “did nothing” Granted at the wrong scope, or to a user not the group az role assignment list --assignee <id> --all Grant to the group at MG/sub scope
9 Resource resolves to public IP despite Private Endpoint Private DNS zone not linked to the VNet nslookup <host> returns public IP Link privatelink.* zone; register the A record
10 Allowed-locations policy blocks a global resource Global resources report location: global Policy error names the location Exclude global resource types / use the built-in’s location list
11 Budget alert never fires Threshold/contact misconfigured, or spend genuinely under az consumption budget show; check action group Fix threshold/contact; budgets evaluate on a delay
12 Can’t assign Owner to the pipeline Contributor can’t grant RBAC (by design) Pipeline identity is Contributor, not UAA/Owner Use a PIM-eligible Owner or a scoped UAA for the bootstrap

The expanded form, with the full reasoning for the entries that bite hardest:

2. A deny policy blocks a resource you expected to succeed. Root cause: The deny is broader than intendeddeny-public-IP catching a Load Balancer frontend, or allowed-locations catching a global resource — and because it’s assigned at an MG it blocks across every child subscription at once. Confirm: Read policyEvaluationDetails in the deployment error (it names the assignment and definition); or az policy state list --filter "complianceState eq 'NonCompliant'" to see what’s tripping. Fix: Decide whether the policy is correct (the resource genuinely violates intent) or too broad. If too broad, narrow it with parameters/exclusions (notIn, excluded scopes) and validate in DoNotEnforce before re-enabling. Avoid blanket exemptions — they erode the guardrail.

3. VNet peering stuck in Initiated. Root cause: Peering is directional — a link created on the spoke alone leaves the relationship half-built, so traffic never flows. Confirm: az network vnet peering list -g <rg> --vnet-name <vnet> --query "[].{name:name,state:peeringState}" shows Initiated, not Connected. Fix: Create the peering in both directions (hub→spoke and spoke→hub). Both links must exist for the state to become Connected.

4. Spoke VM cannot reach the internet after you add the firewall. Root cause: The UDR default route (0.0.0.0/0 → firewall private IP) is attached and forces all egress to the firewall, but the firewall has no rule allowing that traffic, so it’s dropped — exactly the badge-3 control point. Confirm: az network nic show-effective-route-table --name <nic> -g <rg> shows next hop = the firewall; the firewall’s logs (in the central LAW) show the connection denied. Fix: Add an Azure Firewall network or application rule permitting the required egress, or remove the UDR while you isolate connectivity. See Azure Firewall forced tunneling & hub-spoke routing.

5. DINE policy never onboards new resources. Root cause: A DeployIfNotExists policy needs a managed identity with the right role at the right scope and a remediation task to act on existing resources — miss either and nothing happens. Confirm: az policy remediation list is empty; the assignment’s managed identity has no role assignment at the target scope. Fix: Trigger remediation (az policy remediation create) and grant the assignment’s managed identity the role it needs (e.g. Log Analytics Contributor). New resources are remediated automatically; existing ones need the explicit task.

7. A new subscription isn’t governed. Root cause: Creating a subscription does not place it under a governed MG — until you move it, it inherits nothing. Confirm: az account management-group subscription show --name <mg> --subscription <sub> doesn’t list it; the sub shows no inherited policy. Fix: Move the subscription under the correct MG (az account management-group subscription add); inheritance applies immediately. This is what subscription vending automates.

Best practices

Security notes

The landing zone is your security baseline, so treat it that way. Grant RBAC to groups, scoped high, least-privilege; make privileged roles eligible via PIM, not standing. Keep corp workloads private by policy (deny public IPs; reach them through Bastion or the firewall, never a public NIC). Force all egress through the hub firewall with a UDR so traffic is inspected and logged in one place. Turn on Microsoft Defender for Cloud and enforce its onboarding with DINE policy so coverage cannot drift. Funnel every diagnostic and Activity Log into the central Log Analytics workspace so security can hunt across the whole estate. And never embed secrets in IaC — use a pipeline identity with workload identity federation rather than a long-lived service-principal secret. This is the Zero Trust multilayer model applied to the platform itself; deepen it with Azure landing zone — security.

The security controls the landing zone bakes in, what each defends against, and the policy/mechanism that enforces it:

Control Mechanism Defends against Enforced by
No standing admin PIM-eligible Owner/UAA Stolen-token lateral movement PIM + access reviews
Group, least-privilege RBAC Contributor at sub scope Privilege sprawl/escalation RBAC plan + audit
Private corp workloads Deny-public-IP policy Internet-exposed internal VMs Azure Policy (deny)
Inspected, logged egress Firewall + forced-tunnel UDR Exfiltration, blind traffic UDR + firewall rules
Secretless config Managed identity + KV references Secrets in plaintext/IaC Key Vault + MI
No-drift monitoring DINE LAW + Defender onboarding Coverage gaps Azure Policy (DINE)
Data in approved geos Allowed-locations policy Sovereignty/compliance breach Azure Policy (deny)
Reproducible, reviewed changes IaC + PR-gated pipeline Unaudited portal drift CI/CD + branch policy

Cost & sizing

What drives the landing-zone bill is not the governance scaffolding — management groups, policy, peering, and budgets are free. The cost is the shared platform services you run continuously (firewall, gateways, Bastion) plus telemetry ingestion. Right-sizing is mostly about whether you actually need each shared service yet, and scoping what you log. Rough INR figures (production-grade, vary by region and usage):

Component What you pay for Rough INR / month Free-tier note
Management groups + policy Nothing ₹0 Always free
VNet peering Per-GB transferred (intra-region small) ₹0–low Empty peering is free
Azure Firewall (Standard) Hourly + per-GB processed ~₹35,000–45,000 No free tier; the big platform cost
Azure Bastion (Basic) Hourly ~₹12,000–14,000 No free tier; can deallocate
VPN Gateway (VpnGw1) Hourly ~₹12,000–15,000 Only if hybrid is needed
Log Analytics ingestion Per-GB ingested + retention ~₹20,000/100GB-mo 5 GB/mo free-ish; scope with DCRs
Defender for Cloud Per-resource per plan varies by estate Free CSPM tier; paid plans per resource
Budgets / Cost Management Nothing ₹0 Always free

The right-sizing decisions that actually move the bill:

Decision Cheaper choice When it’s safe Trade-off
Firewall vs NSG-only egress NSG + NAT Gateway Small estate, no L7 inspection need Lose central app-layer inspection
Bastion always-on vs on-demand Deallocate when idle Dev/test, rare access Reconnect delay
Per-GB LAW vs commitment tier Commitment tier Sustained > 100 GB/day Pay for committed volume
Log everything vs DCR-scoped DCR-scoped Always Less data if you under-scope
Standard vs Premium Firewall Standard No IDPS/TLS-inspection need Lose IDPS/TLS inspection
Run all platform services now Add as needed Always — start minimal Retro-fit effort later

For the full discipline — reservations, savings plans, hybrid benefit, anomaly alerts — see Azure FinOps & cost engineering and reservations & savings-plan strategy.

Interview & exam questions

1. Walk me through how you would design an Azure landing zone for a company moving off a single subscription. Start from the business drivers, then the eight CAF design areas. Adopt a management-group hierarchy (platform / landing zones / sandbox / decommissioned) for inherited policy and RBAC; split subscriptions by responsibility (platform vs application) so each workload is its own blast-radius and billing boundary; centralize networking in a hub with peered spokes and forced-tunnel egress; grant least-privilege RBAC to groups with PIM for privileged roles; enforce tagging and deny risky resources via policy at MG scope; funnel all telemetry into one Log Analytics workspace; and add per-subscription budgets. Deliver it all as IaC through a PR-gated pipeline.

2. Why management groups instead of just applying policy per subscription? Inheritance and scale. One assignment at an MG flows to every current and future subscription beneath it, so governance is automatic for new teams and you have a single place to change a guardrail — versus drift and toil when each subscription is configured by hand.

3. A deny policy is blocking a legitimate deployment. How do you debug it? Read policyEvaluationDetails in the error to find which assignment and definition denied it. Decide whether the policy is correct (the resource genuinely violates intent) or too broad. If too broad, narrow it with parameters/exclusions (notIn, excluded scopes) and validate in DoNotEnforce before re-enabling. Avoid blanket exemptions — they erode the guardrail.

4. Platform team owns the hierarchy and hub; app teams own workloads. How do you structure that so app teams move fast without breaking governance? Two ownership layers in IaC: the platform repo owns MGs, policy, and the hub; app teams own their spokes and workloads and ship via PR into their own subscription, where guardrails already inherit. App teams get Contributor on their subscription and no rights above it, so they cannot weaken platform policy. This is “subscription democratization.”

5. How do you keep monitoring from drifting as teams add resources? Enforce it with DeployIfNotExists policy that auto-onboards new resources to the central Log Analytics workspace and Defender for Cloud, with remediation tasks for existing ones. Coverage becomes a property of the platform, not something a team must remember.

6. How does this design answer “show me the bill by team”? Subscriptions are the billing boundary (one per workload/env), and a require-tags policy guarantees every resource carries costCenter/owner/env. Cost Management then slices spend by subscription and tag, and per-subscription budgets with alerts warn owners before they overrun.

7. What forces a spoke’s outbound traffic through the hub firewall, and what’s the classic failure? A route table (UDR) whose default route (0.0.0.0/0) has next hop = the firewall’s private IP, attached to the spoke subnet (“forced tunneling”). The classic failure is attaching the UDR but forgetting a firewall rule to allow the traffic — egress is then dropped, looking like a connectivity bug. Confirm with show-effective-route-table and the firewall logs.

8. Difference between Deny, Audit, and DeployIfNotExists policy effects? Deny rejects a non-compliant create/update at evaluation time (a hard guardrail). Audit flags non-compliance but allows it (visibility without blocking). DeployIfNotExists deploys a related resource when it’s missing (e.g. Log Analytics onboarding) and needs a managed identity plus a remediation task to act on existing resources.

9. Why grant RBAC to groups and scope it high rather than to individuals per-resource? Group-based, high-scope, least-privilege RBAC is auditable and manageable: you add/remove a person from a group instead of hunting per-resource assignments, you avoid privilege creep, and combined with PIM you avoid standing admin rights.

10. What is PIM and which roles do you put behind it? Privileged Identity Management makes roles eligible rather than active, so a human activates them just-in-time with MFA, justification, and (for the top roles) approval, for a bounded duration. Put Owner and User Access Administrator behind it always; consider it for any role that can change access or delete platform resources.

11. Your VNet peering shows Initiated, not Connected. What’s wrong? Peering is directional and was created on only one side. Create the peering in both directions (hub→spoke and spoke→hub); both links must exist before the state becomes Connected.

12. How do you onboard a new application team without a networking ticket? Vend them a subscription already under the governed landingzones MG (so it inherits guardrails), grant their group Contributor on just that subscription, and let them deploy their spoke + workload via a PR into their own repo. Because policy, DINE monitoring, and the peering pattern are platform-owned and inherited, the team is productive in days with no manual networking step.

These map most directly to AZ-305: Designing Microsoft Azure Infrastructure Solutions, with reinforcement of AZ-104 (Administrator). The cert mapping for revision:

Question theme Primary cert Exam objective area
MG hierarchy, governance strategy AZ-305 Design governance & identity
Policy effects, require-tags, deny AZ-305 / AZ-104 Design/implement governance
Hub-spoke, peering, forced tunnel AZ-305 / AZ-104 Design/implement networking
RBAC strategy, PIM AZ-305 / AZ-500 Design identity & access
Central monitoring, DINE onboarding AZ-305 / AZ-104 Design/implement monitoring
Budgets, tags, cost attribution AZ-305 Recommend cost solutions
IaC, pipelines, reproducibility AZ-305 / AZ-400 Design platform automation

Quick check

  1. Why assign policy and RBAC at a management group rather than on each subscription?
  2. What is the difference between a platform subscription and an application (landing-zone) subscription?
  3. In hub-spoke, what forces a spoke’s outbound traffic through the hub firewall — and what’s the one thing people forget to add alongside it?
  4. Your new spoke peering reads Initiated. What did you miss, and what command confirms it?
  5. Why dry-run a new policy initiative in DoNotEnforce mode before enforcing it?

Answers

  1. Because management groups inherit — a single assignment flows to every current and future subscription beneath them, so new teams are governed automatically instead of someone remembering to re-apply guardrails each time.
  2. Platform subscriptions hold shared services (connectivity, management, identity) owned by the platform team and changing rarely; application subscriptions are handed one-per-workload to app teams and are the blast-radius/billing boundary for that workload.
  3. A route table (UDR) whose default route (0.0.0.0/0) has next hop = the firewall’s private IP, attached to the spoke subnet (forced tunneling). People forget the matching firewall rule to allow that egress, so traffic is dropped and it looks like a connectivity bug — confirm with az network nic show-effective-route-table.
  4. You created the peering on only one side; peering is directional. Create it in both directions, and confirm with az network vnet peering list --query "[].peeringState" — both must read Connected.
  5. DoNotEnforce evaluates compliance without blocking anything, so you can read what would be denied and fix scope or exclusions before a too-broad deny breaks legitimate deployments (including your own platform bootstrap).

Exercise

Extend the lab into the cost pillar. Using the rg-corp-prod resource group from the lab (or recreate it), create a budget with an alert so an owner is notified before spend exceeds a threshold:

az consumption budget create \
  --budget-name corp-prod-monthly \
  --amount 10 \
  --category Cost \
  --time-grain Monthly \
  --start-date 2026-06-01 --end-date 2026-12-31 \
  --resource-group rg-corp-prod

Then answer in two or three sentences: how does requiring the costCenter tag (from the lab) combine with this budget to satisfy Northwind’s “show me the bill, by team” requirement? Clean up afterward.

Capstone deliverables & self-assessment rubric

To call the capstone “done,” produce these deliverables:

Acceptance criteria — the build passes if all are true:

Self-assessment rubric — grade each area 0–3 and aim for 2+ everywhere before you consider yourself “hero” level:

Area 0 — Not done 1 — Started 2 — Solid 3 — Production-grade
Resource organization Flat / ad-hoc MGs exist, no plan CAF hierarchy, platform/app split Subscription vending automated
Networking Single flat VNet Hub + spoke exist Peered + UDR egress Firewall rules, DNS, Bastion, hybrid
Identity Per-user Owner Some groups used Group RBAC, scoped high PIM JIT, least privilege everywhere
Governance No policy A few audits Required-tags + deny assigned DINE auto-remediation, shipped as code
Monitoring Nothing central Workspace exists Diagnostics flow in Defender + alerts + workbooks
Security Defaults Some hardening Private corp + Defender on Zero Trust, secretless, reviewed
Cost No tags/budgets Tags inconsistent Tags + budgets + alerts Per-team chargeback, anomaly alerts
Automation Portal-built Some scripts Bicep/Terraform for all Pipeline-deployed, PR-gated

Glossary

Next steps

Congratulations — that is the Azure Zero-to-Hero capstone. The natural next lesson is the course finale on getting hired and certified: Azure Interview & Certification Prep: Scenarios + AZ-104/AZ-305 Roadmap.

To take any single pillar from this capstone to full production depth, build on the KloudVin landing-zone series:

AzureLanding ZoneCapstoneGovernanceBicep
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments