Capstone: Design & Build a Production-Ready Azure Landing Zone

This is the capstone of the Azure Zero-to-Hero course. Everything you have learned so far — what a subscription is, how to drive Azure from the CLI, how Microsoft Entra ID and RBAC work — now comes together into one project: designing and building a production-ready Azure landing zone. A landing zone is the pre-built, governed environment that your applications “land” in. Networking, identity, policy guardrails, monitoring, and cost controls are wired up first, so that when an application team shows up, they inherit security and consistency on day one instead of re-inventing it (and getting it wrong) on every project.

We will work the way a real platform team does: start from a business brief, make explicit design decisions, then build in stages — each stage reusing a deeper lesson from the course so you always know where to go for detail. Crucially, this is not a tour of prose. A landing zone is a dense lattice of choices — eight design areas, dozens of policy effects, a handful of CIDRs and SKUs and role IDs that must line up exactly — and the way a senior architect actually holds it is as a set of reference tables they scan under pressure. So this capstone is table-first: every design area carries an option matrix, every failure mode a symptom-cause-confirm-fix row, every tier a side-by-side grid. Read the prose once to understand the why; keep the tables open while you build.

You will end with a small but genuinely real landing zone running in your own free-tier subscription, a set of acceptance criteria to prove it works, and a self-assessment rubric to grade yourself the way an Azure Review Board would. By the end you will be able to walk into a design review, sketch the management-group tree on a whiteboard, justify every subscription boundary, and know — to the exact az command — how to confirm each piece is wired correctly.

What problem this solves

The pain this prevents is the hand-built subscription that nobody can govern. One engineer clicks together a subscription two years ago; resources accrete with no tags, public IPs sprout on NICs, secrets land in plaintext app settings, monitoring is whatever each team remembered to turn on, and the bill is one undifferentiated number Finance cannot attribute to anyone. When the company grows to ten teams, there is no way to apply a security rule everywhere at once, no way to give a new team an isolated environment without a week of ticket-driven networking, and no way to answer “who spent this?” The landing zone is the antidote: governance, connectivity, identity, and observability are provisioned and inherited before the first workload arrives, so consistency is the default and drift is the exception.

What breaks without it: every new project re-implements (and re-mis-implements) networking and security; a misconfiguration in one workload can reach another because there is no blast-radius boundary; a compliance auditor asks “show me that no resource is public” and the only honest answer is “we’d have to check each one by hand.” Who hits it: every organization past its first few subscriptions — startups scaling to a platform team, enterprises consolidating shadow-IT subscriptions, and anyone preparing for AZ-305, where landing-zone design is the spine of the exam.

To frame the whole field before the deep dive, here are the eight Cloud Adoption Framework (CAF) design areas this capstone builds, the question each answers, the primary Azure construct, and the failure you get if you skip it:

Design area	The question it answers	Primary construct	Failure if skipped
Resource organization	Where does everything live and inherit from?	Management groups + subscriptions	Flat tenant; cannot govern at scale
Identity & access	Who can do what, and how is it granted?	Entra ID, RBAC, PIM	Standing admins; per-user sprawl
Network topology	How do workloads connect and stay isolated?	Hub-spoke VNets, peering, UDR	Flat network; no security boundary
Governance	How are rules enforced, not just audited?	Azure Policy initiatives	Drift; rules nobody applies twice
Management / monitoring	Where does telemetry go and how is it queried?	Central Log Analytics, Defender	Blind ops; no cross-estate hunting
Security baseline	What is the default posture for every workload?	Defender for Cloud, Zero Trust	Inconsistent, weakest-link security
Cost management	How is spend attributed and capped?	Tags, budgets, Cost Management	Mystery bill; no per-team chargeback
Platform automation / DevOps	How is all of the above shipped reproducibly?	Bicep/Terraform + CI/CD	Portal drift; not reproducible

Learning objectives

By the end of this capstone you can:

Translate a business brief into a concrete Azure landing-zone design across all eight CAF areas — management groups, subscriptions, networking, identity, governance, monitoring, security, and cost.
Justify the core decisions — why a management-group hierarchy, why platform vs application subscriptions, why hub-spoke — against the trade-off tables a reviewer will expect.
Build the foundation with Bicep, Terraform, and az, deploying a management group, resource groups, a hub VNet, a policy assignment, and a Log Analytics workspace.
Apply policy guardrails (required tags, deny public IPs, DeployIfNotExists) and a monitoring + cost baseline so the platform stays compliant on its own.
Diagnose the classic landing-zone failures — a too-broad deny policy, one-sided peering, egress that bypasses the firewall, DINE that never remediates — with the exact command that confirms each.
Verify the result against explicit acceptance criteria and grade it with a self-assessment rubric.
Know exactly which course lesson to open for any single design area when you build the full thing for real.

Prerequisites & where this fits

This is the final, Advanced lesson of the Azure Zero-to-Hero course and it assumes the whole course. You should be comfortable with the account model (tenant → management group → subscription → resource group → resource), driving Azure from Cloud Shell with the az CLI, reading JSON output, and the basics of Microsoft Entra ID and RBAC. If any of those feel shaky, work through the earlier lessons first — this capstone links back to them at each stage rather than re-teaching them. The deeper landing-zone series carries each pillar to full production depth: start from Designing an Azure Landing Zone with the Cloud Adoption Framework for the end-to-end blueprint.

Here is the scope boundary stated plainly — what this capstone builds versus what it defers to the deep-dive lessons, so you know where the edges are:

Topic	In this capstone	Deferred to	Why
MG hierarchy + sub layout	Yes (design + lab via MG)	Resource organization	Real sub vending needs enrollment
RBAC to groups, PIM	Design + reasoning	Identity & access	PIM needs Entra ID P2
Hub-spoke, peering, UDR	Yes (built in lab)	Network topology	Firewall SKU has hourly cost
Policy require-tags / deny / DINE	Yes (require-tags in lab)	Governance	DINE remediation needs identity setup
Central Log Analytics	Yes (built in lab)	Governance	Ingestion billing at scale
Defender for Cloud plans	Design + reasoning	Security baseline	Per-resource pricing
Budgets + cost attribution	Design + exercise	FinOps & cost engineering	Chargeback model is org-specific
Ship it all as code	Representative IaC	Policy as code	Full pipeline is its own lesson

Core concepts

Five mental models make every later decision obvious.

Inheritance is the whole point of a hierarchy. A management group (MG) is a container above subscriptions. A policy or RBAC assignment placed on an MG flows down to every subscription beneath it — including subscriptions that do not exist yet. This single property is why you organize at all: assign a guardrail once, and every current and future team is governed automatically. The alternative — configuring each subscription by hand — guarantees drift and toil.

The subscription is the blast-radius and billing boundary. A subscription is the unit of scale, the limit of a misconfiguration’s reach, and the line Finance bills along. You split subscriptions by responsibility, not convenience: shared platform services (connectivity, management, identity) in their own subscriptions owned by the platform team; each application workload-or-environment in its own subscription handed to an app team. One per workload keeps blast radius small and the bill clean.

Connectivity is centralized, workloads are disposable. In hub-spoke, shared network services (firewall, DNS, Bastion, hybrid gateways) live once in a hub VNet; each workload gets a spoke VNet peered to the hub. A route table (UDR) forces spoke egress through the hub firewall so all traffic is inspected and logged in one place. Spokes stay small and replaceable; the security boundary between teams is real.

Governance is preventive, not a quarterly audit. Azure Policy evaluates resources at create/update time and can deny non-compliant ones, audit them, or DeployIfNotExists (DINE) a missing configuration (like Log Analytics onboarding). Assigned at an MG, policy makes compliance a property of the platform rather than something a team must remember. You always dry-run a new policy (DoNotEnforce) and read the compliance results before flipping enforcement on.

Least privilege is granted high, to groups, just-in-time. Entra ID is the control plane. You grant Azure RBAC roles to groups, never individuals, scoped at the MG or subscription level rather than per-resource, so membership — not a hunt through assignments — controls access. Privileged roles (Owner, User Access Administrator) are made eligible, not active, through PIM, so engineers activate them just-in-time with MFA and approval, leaving no standing admin.

The vocabulary in one table

Before the deep sections, pin down every moving part. The glossary at the end repeats these for lookup; this table is the mental model side by side:

Concept	One-line definition	Where it lives	Why it matters to the landing zone
Management group	Container above subscriptions	Tenant hierarchy	Policy + RBAC inherit down
Subscription	Unit of scale / billing / blast radius	Under an MG	The boundary you split on
Resource group	Lifecycle container for resources	In a subscription	Deploy/delete as a unit
Hub VNet	Shared network services	Connectivity subscription	One place to inspect + connect
Spoke VNet	A workload’s network	App subscription	Peered, small, disposable
UDR (route table)	Overrides default routing	Attached to a subnet	Forces egress via firewall
Azure Policy initiative	Grouped policy definitions	Assigned at a scope	Enforces/audits at scale
DINE	Auto-deploys missing config	Policy effect	Onboards monitoring automatically
RBAC role assignment	Principal + role + scope	At MG/sub/RG/resource	Who can do what, where
PIM	Just-in-time privileged roles	Entra ID P2	No standing admin
Log Analytics workspace	Central telemetry store	Management subscription	Cross-estate queries
Budget	Spend cap with alerts	Per subscription/RG	Warns before overrun

The brief

Our fictional company is Northwind Freight, a mid-size logistics firm moving from a single hand-built subscription (one engineer clicked it together two years ago, nobody remembers what is in it) to a governed Azure foundation. Leadership wants three things, in their words:

“Stop the wild west.” Every resource must be tagged, owned, and monitored. No more orphaned public IPs and no mystery spend.
“Let app teams move fast — safely.” A new project team should get a ready-to-use, isolated environment with guardrails already on, without filing a networking ticket.
“Show me the bill, by team.” Finance needs cost broken down per workload and per environment, with alerts before budgets blow.

Translated into platform language, Northwind needs: a management-group hierarchy for inherited policy and RBAC; separate subscriptions for shared platform services versus application workloads; a hub-spoke network so connectivity is centralized; an identity baseline of least-privilege RBAC granted to groups; policy guardrails that enforce tagging and block risky resources; a monitoring baseline funnelling logs to one place; and cost controls with budgets and alerts. That is exactly an Azure landing zone — and exactly the eight design areas of the Cloud Adoption Framework.

Here is each leadership ask mapped to the design area that satisfies it, the concrete mechanism, and the acceptance signal that proves it is done:

Leadership ask	CAF area	Mechanism	Acceptance signal
“Stop the wild west” (tags)	Governance + Cost	Require-tags deny policy at MG	Untagged resource is blocked
“Stop the wild west” (no public IPs)	Networking + Governance	Deny-public-IP policy on corp branch	Public IP on a corp NIC is denied
“Stop the wild west” (monitored)	Monitoring	DINE onboarding to central LAW	New resource auto-sends diagnostics
“Move fast, safely” (isolated env)	Resource org + Networking	One sub per workload + peered spoke	New sub inherits guardrails, peers to hub
“Move fast, safely” (no tickets)	Platform automation	Spokes shipped via PR into own sub	App team deploys without platform ticket
“Show me the bill, by team” (split)	Cost	Subscription = billing boundary	Cost Management groups by subscription
“Show me the bill, by team” (attribute)	Cost + Governance	costCenter/env tags enforced	Spend slices by tag
“Show me the bill, by team” (warn)	Cost	Budget alerts at 80% / 100%	Owner notified before overrun

Design decisions

A landing zone is mostly a set of decisions. Implementation is the easy part once the decisions are explicit and defensible. Here are the seven that matter, with the reasoning a reviewer will expect — and the course lesson that owns each in depth. First, the whole decision set as one table you can take into a review:

#	Decision	Northwind choice	Chief alternative	Why the choice wins
1	Hierarchy	CAF MG tree	Flat tenant	Inheritance to future subs
2	Sub split	Platform vs app	One giant sub	Blast radius + clean bill
3	Network	Hub-spoke	Flat / full-mesh	Central inspection, scales
4	Identity	Group RBAC + PIM	Per-user Owner	Least privilege, auditable
5	Governance	Policy at MG scope	Per-sub config	Enforced once, inherits
6	Monitoring	One central LAW	Per-team workspaces	Cross-estate hunting
7	Cost	Per-sub budgets + tags	Single bill	Per-team attribution

1. Management-group hierarchy

Decision: adopt the CAF reference hierarchy rather than a flat tenant. Management groups let you assign policy and RBAC once and inherit everywhere beneath them, including subscriptions that do not exist yet.

Tenant Root Group
└── northwind                    (top-level MG — company guardrails)
    ├── platform                 (shared services)
    │   ├── identity             (Entra Connect, domain services)
    │   ├── management           (Log Analytics, automation, backup)
    │   └── connectivity         (hub VNet, firewall, DNS)
    ├── landingzones             (application workloads)
    │   ├── corp                 (internal — no public ingress)
    │   └── online               (internet-facing)
    ├── sandbox                  (experiments — loose policy)
    └── decommissioned           (quarantine before deletion)

A policy assigned at landingzones (for example, “deny public IP on a NIC”) flows to corp, online, and every future subscription under them. New teams inherit guardrails automatically. Detail: Azure landing zone — resource organization.

Each management group in the reference tree has a job. Here is what lives where, what is assigned at each node, and why it exists as its own scope:

Management group	Purpose	Typical policy assigned here	Typical RBAC here
`northwind` (top)	Company-wide guardrails	Require-tags; allowed-locations; audit baseline	Platform team Reader (broad visibility)
`platform`	Shared-service guardrails	Stricter diagnostic + security baseline	Platform team Contributor
`platform/connectivity`	Network services	Deny non-approved network resource types	Network admins Contributor
`platform/management`	Telemetry + automation	DINE Log Analytics onboarding	Ops team Contributor
`platform/identity`	Identity services	Identity-specific compliance	Identity admins Contributor
`landingzones`	App-workload guardrails	Deny-public-IP; enforce HTTPS; DINE Defender	(none broad — set per app sub)
`landingzones/corp`	Internal workloads	Deny all public ingress	App team Contributor on their sub
`landingzones/online`	Internet-facing	Require WAF / Front Door fronting	App team Contributor on their sub
`sandbox`	Experiments	Loose: audit-only, spend cap	Developers Contributor, time-boxed
`decommissioned`	Pre-deletion quarantine	Deny new resource creation	Platform team only

A subtle but exam-worthy point: the order and scope of assignment matter. The narrower the scope, the more specific the rule should be. Here is how to reason about where to place an assignment:

Place the assignment at…	When the rule is…	Example	Trade-off
Tenant Root	Truly universal, rarely	(usually left empty)	Hard to change; affects everything
Top MG (`northwind`)	Company-wide intent	Require-tags, allowed-locations	Broad blast radius if wrong
Mid MG (`landingzones`)	Applies to all workloads	Deny-public-IP	Inherits to corp + online
Leaf MG (`corp`)	Branch-specific	Deny all public ingress	Doesn’t affect `online`
Subscription	One team’s exception	A workload-specific waiver	Doesn’t inherit; per-sub toil

2. Platform vs application subscriptions

Decision: the subscription is the unit of scale and the blast-radius / billing boundary — so split by responsibility, not convenience. Platform subscriptions (connectivity, management, identity) are owned by the platform team and rarely change. Application (landing-zone) subscriptions are handed one-per-workload-or-environment to app teams.

Subscription	Lives under	Owned by	Purpose
`sub-connectivity`	platform/connectivity	Platform	Hub VNet, Firewall, DNS, gateways
`sub-management`	platform/management	Platform	Log Analytics, automation, backup vault
`sub-identity`	platform/identity	Platform	Entra Connect, domain services
`sub-corp-prod`	landingzones/corp	App team	Internal production workloads
`sub-online-prod`	landingzones/online	App team	Internet-facing production workloads

This gives Finance a clean per-team bill (subscription = cost boundary) and limits blast radius: a misconfiguration in one app subscription cannot touch another. Detail: Azure landing zone — resource organization.

Why not just use resource groups to separate teams inside one subscription? Because the subscription is the boundary for several things a resource group is not. The comparison that settles the argument:

Boundary property	Resource group	Subscription	Implication
Billing / cost rollup	Tag-based only	First-class boundary	Sub = clean per-team bill
RBAC inheritance root	Yes (narrow)	Yes (broad)	Sub-level Contributor scopes a team neatly
Policy assignment scope	Yes	Yes (+ inherits from MG)	Sub inherits MG guardrails automatically
Many Azure quotas/limits	Shared with sub	Per subscription	One team can’t exhaust another’s quota
Blast radius of Owner	RG only	Whole subscription	App team Owner can’t reach platform
Move between MGs	No	Yes	Reorganize governance without rebuild

And subscriptions are not free of limits — knowing the real ceilings keeps your design honest. Representative subscription-scope limits to design against (treat as “design well below,” not hard targets to chase):

Limit	Approximate ceiling	Why it shapes design
Resource groups per subscription	~980	Plenty; not a real constraint
Role assignments per subscription	~2,000	Favors group-based RBAC over per-user
VNets per subscription (default)	~1,000 (raisable)	Spoke-per-workload scales fine
Subscriptions per management group	Large	MG tree, not flat, is the limiter on governance
Public IPs per subscription (default)	~10 standard (raisable)	Deny-public-IP keeps this near zero in corp
Azure Policy assignments per scope	~200	Group definitions into initiatives

3. Hub-spoke networking

Decision: centralize shared network services in a hub VNet (firewall, DNS, Bastion, VPN/ExpressRoute gateway) and give each workload a spoke VNet peered to the hub. Force spoke egress through the hub firewall with a route table (UDR). This means one place to inspect and log traffic, one place to attach hybrid connectivity, and spokes that stay small and disposable.

The alternative — a flat VNet shared by everyone, or full-mesh peering between workloads — does not scale and erases the security boundary between teams. Detail: Azure landing zone — network topology. The three topologies compared, so the choice is defensible:

Topology	How it connects	Pros	Cons	Verdict
Flat shared VNet	One VNet, all teams	Simplest	No isolation; noisy-neighbor; doesn’t scale	Avoid past a pilot
Full-mesh peering	Every VNet peers every VNet	Direct paths	O(n²) peerings; no central inspection	Unmanageable at scale
Hub-spoke	Spokes peer only the hub	Central inspection, hybrid, scales	One extra hop; hub is a focal point	The standard
Virtual WAN	Microsoft-managed hub	Managed routing, global	Cost; less control	Large/global estates (Virtual WAN)

The hub carries a fixed set of shared services, each in a subnet with a mandatory or conventional name. Get these exact, because Azure validates several of them:

Hub component	Subnet name (exact)	Typical CIDR	Job
Azure Firewall	`AzureFirewallSubnet`	10.10.1.0/26	Inspect + log all egress
Firewall mgmt (forced tunnel)	`AzureFirewallManagementSubnet`	10.10.2.0/26	Firewall management plane
Bastion	`AzureBastionSubnet`	10.10.3.0/26	Browser RDP/SSH, no public NIC
VPN/ER gateway	`GatewaySubnet`	10.10.4.0/27	Hybrid connectivity
Private DNS resolver inbound	`<custom>` (delegated)	10.10.5.0/28	Hybrid DNS resolution
Shared workload	`snet-shared`	10.10.6.0/24	Jump hosts, shared tooling

The CIDR plan must not overlap, because peered VNets with overlapping ranges cannot route. A clean, non-overlapping allocation for Northwind:

Network	CIDR	Subnets	Notes
Hub	10.10.0.0/16	firewall, bastion, gateway, dns	Platform-owned
Corp spoke	10.20.0.0/16	snet-workload, snet-data, snet-pe	No public ingress
Online spoke	10.30.0.0/16	snet-web, snet-appgw, snet-pe	AppGW + WAF in front
Reserved (future)	10.40.0.0/16	—	Next workload
On-prem (hybrid)	172.16.0.0/16	—	Advertised via gateway

Peering has options that change cost and reachability — set them deliberately, not by accepting defaults:

Peering setting	Hub→spoke value	Spoke→hub value	Why
`allowVirtualNetworkAccess`	true	true	Permit traffic across the peering
`allowForwardedTraffic`	true	true	Let firewall-forwarded traffic transit
`allowGatewayTransit`	true	false	Hub shares its gateway
`useRemoteGateways`	false	true	Spoke uses the hub’s gateway
Result if mismatched	—	—	Asymmetric/blocked routing; “Initiated” state

4. Identity baseline

Decision: Microsoft Entra ID is the control plane. Grant Azure RBAC roles to groups, never individuals, and scope them at the management-group or subscription level rather than per-resource. Privileged roles (Owner, User Access Administrator) are made eligible, not active, through Privileged Identity Management (PIM) so engineers activate them just-in-time with MFA and approval.

Least privilege is the rule: app teams get Contributor on their own subscription and nothing above it; the platform pipeline identity gets Owner only at the management group it manages. This builds directly on the Entra ID fundamentals: tenants, users, groups, RBAC lesson and goes deeper in Azure landing zone — identity & access and Entra RBAC governance.

Who gets which role at which scope — the RBAC plan a reviewer will check line by line:

Principal (group)	Role	Scope	Standing or PIM
`grp-platform-admins`	Owner	`platform` MG	PIM-eligible
`grp-platform-engineers`	Contributor	`platform` MG	Standing
`grp-network-admins`	Network Contributor	`sub-connectivity`	Standing
`grp-ops`	Log Analytics Contributor	`sub-management`	Standing
`grp-corp-app-team`	Contributor	`sub-corp-prod`	Standing
`grp-online-app-team`	Contributor	`sub-online-prod`	Standing
`grp-security`	Security Reader	`northwind` MG	Standing
`grp-billing`	Cost Management Reader	`northwind` MG	Standing
Any human	Owner / UAA	any	PIM-only, JIT

The built-in roles you actually use here, what they grant, and the trap each one carries:

Role	Grants	Use for	Trap
Owner	Full access + manage access	Almost never standing	Can grant itself anything
Contributor	Full manage, not RBAC	App teams on their sub	Cannot assign roles (by design)
Reader	View only	Auditors, security	Read can still see secrets’ existence
User Access Administrator	Manage RBAC only	Break-glass via PIM	Privilege-escalation vector
Network Contributor	Manage network resources	Network admins	Scope tightly to connectivity
Log Analytics Contributor	Manage workspaces + data	Ops	Can read all ingested logs
Key Vault Secrets User	Read secret values	Workload identities	Grant per-vault, not broad

PIM turns standing privilege into just-in-time. The settings that make it real:

PIM control	Recommended setting	Why
Activation requires MFA	On	Proves the human, not a stolen token
Activation requires justification	On	Audit trail of why
Activation requires approval	On for Owner/UAA	Two-person control on top privilege
Maximum activation duration	1–4 hours	Privilege expires automatically
Eligible vs active	Eligible by default	No standing admin
Access reviews	Quarterly	Catch stale eligibility

Deeper still, privileged-role elevation for resources is its own discipline — see PIM for Azure resources: JIT elevation.

5. Policy guardrails

Decision: governance is preventive, not a quarterly audit. Assign Azure Policy initiatives at the management-group scope so they inherit. The three Northwind needs first:

Require tags (costCenter, owner, env) — deny resources without them, so Finance gets clean cost attribution.
Deny public IPs on NICs in the corp branch — internal workloads stay private by construction.
DeployIfNotExists (DINE) — auto-onboard new resources to Log Analytics and Microsoft Defender for Cloud, so monitoring is never forgotten.

Always dry-run a new initiative in DoNotEnforce mode first, read the compliance results, then flip enforcement on. Detail: Azure landing zone — governance and, for shipping policy through CI/CD, Azure Policy as code and Azure Policy & governance at scale.

The policy effects are the heart of governance. Each behaves differently at evaluation time — know exactly what each does and when to reach for it:

Effect	What it does	Needs identity?	Blocks deploy?	Use for
Deny	Rejects non-compliant create/update	No	Yes	Hard guardrails (no public IP)
Audit	Flags non-compliance, allows it	No	No	Visibility before enforcing
Append	Adds fields to a resource	No	No	Force a tag value, add a setting
Modify	Adds/updates/removes properties	Yes	No	Remediate tags at scale
DeployIfNotExists	Deploys a related resource if missing	Yes	No	Onboard LAW/Defender
AuditIfNotExists	Audits if a related resource is missing	No	No	“Is diagnostics configured?”
Disabled	Turns the policy off	No	No	Temporarily park a rule
DenyAction	Blocks a specific action (e.g. delete)	No	Yes (action)	Protect against deletion

The three Northwind guardrails in detail — definition, scope, parameters, and the failure each prevents:

Guardrail	Built-in definition (intent)	Scope	Key parameter	Prevents
Require costCenter tag	“Require a tag on resources”	`northwind` MG	`tagName=costCenter`	Untagged, unattributable spend
Require owner + env	Same definition, ×2 assignments	`northwind` MG	`tagName=owner` / `env`	Orphaned, unenvironment’d resources
Deny public IP on NIC	“Network interfaces should not have public IPs”	`landingzones/corp`	(none)	Internet-exposed internal VMs
Allowed locations	“Allowed locations”	`northwind` MG	region allow-list	Data landing in wrong geography
DINE Log Analytics	“Configure … to send logs to LAW”	`landingzones`	workspace ID	Monitoring drift
DINE Defender plans	“Configure Defender plan”	`northwind` MG	plan + tier	Security-coverage gaps

Enforcement mode is the safety valve. The two modes and how to use the rollout:

`enforcementMode`	Behavior	When to use
`DoNotEnforce` (Disabled)	Evaluates compliance, does not block or remediate	Always first — read what would be denied
`Default` (Enabled)	Fully enforces (deny blocks, DINE remediates)	After you’ve reviewed `DoNotEnforce` results

6. Monitoring baseline

Decision: one central Log Analytics workspace in the management subscription. Every subscription’s diagnostic settings, Defender for Cloud, and Activity Logs funnel into it. Centralizing means security can query across the whole estate, and DINE policy can enforce onboarding automatically. Detail: Azure landing zone — governance and the Azure Monitor deep dive.

What flows into the central workspace, from where, and the mechanism that puts it there:

Telemetry	Source	Mechanism	Why central
Resource diagnostics (metrics/logs)	Every resource	Diagnostic settings (DINE-enforced)	Query all resources together
Activity log	Each subscription	Diagnostic setting at sub scope	“Who changed what” across estate
Defender for Cloud alerts	All subscriptions	Defender export to LAW	Single security pane
Entra sign-in / audit logs	Tenant	Entra diagnostic settings	Correlate identity with resource events
VM guest logs/perf	VMs	Azure Monitor Agent + DCR	Host-level visibility
Network flow logs	NSGs / firewall	Flow logs → LAW	Traffic forensics

A workspace is not free or infinitely retained — the knobs that drive both behavior and bill:

Workspace setting	Default	Options	Drives
Pricing tier	Pay-as-you-go (PerGB2018)	Commitment tiers (100GB/day…)	Per-GB cost at volume
Retention	30 days	30–730 days (then Archive)	Storage cost + query window
Data collection rule (DCR)	none	Scope what each resource sends	Volume + noise
Table-level retention	inherits workspace	Per-table override	Keep security logs longer, cheaply
Daily cap	none	Cap GB/day	Runaway-ingestion insurance
Access mode	resource-context	workspace-context	Who can read which logs

7. Cost controls

Decision: a budget with alerts per subscription, plus mandatory tags so Cost Management can slice spend by costCenter and env. Alerts fire at 80% and 100% of budget to the subscription owner before the month closes. This answers Northwind’s “show me the bill, by team” directly. For the full discipline, see Azure FinOps & cost engineering and the reservations & savings-plan strategy.

The cost-control mechanisms, what each does, and when it fires:

Mechanism	What it does	Granularity	Action
Budget (Cost)	Tracks actual spend vs amount	Sub / RG / tag	Alert at thresholds
Budget (forecast)	Projects month-end spend	Sub / RG	Alert before overrun
Cost allocation tags	Slice spend by team/env	Resource	Reporting, chargeback
Cost Management views	Group/filter spend	Any dimension	Analysis, anomaly spotting
Action group on budget	Email/webhook/automation	Per budget	Notify owner; trigger runbook
Anomaly alerts	Detect unusual spend	Subscription	Catch surprises early

The tag taxonomy is load-bearing — these are the tags every resource must carry, why, and what enforces them:

Tag	Example value	Purpose	Enforced by
`costCenter`	`logistics`	Charge spend to a budget	Require-tags deny policy
`owner`	`app-team`	Who to call; cleanup target	Require-tags deny policy
`env`	`prod` / `dev`	Separate prod vs non-prod spend	Require-tags deny policy
`workload`	`checkout-api`	Per-app rollup	Convention (audit policy)
`dataClass`	`confidential`	Drive security/retention	Convention (audit policy)
`expiry`	`2026-12-31`	Auto-cleanup of sandbox	Sandbox automation

Architecture at a glance

The diagram traces the landing zone as governance and traffic actually flow through it, left to right. Start at the governance plane: the Entra tenant (with PIM-eligible privileged groups) and the management-group hierarchy carry a single policy initiative — require-tags, deny-public-IP, DINE — that inherits downward. That inheritance lands on the subscriptions zone, where the platform subscriptions (connectivity, management, identity) are separated from the application subscriptions (corp-prod, online-prod); the split is the blast-radius and billing boundary. The platform’s connectivity subscription owns the connectivity hub — Azure Firewall enforcing forced-tunnel egress via a 0.0.0.0/0 UDR, Private DNS with linked privatelink zones, and Bastion for public-NIC-free RDP/SSH. Each workload spoke (corp 10.20/16, online 10.30/16 behind App Gateway + WAF) peers to the hub, and the key arrow loops back: spoke egress returns through the firewall before leaving. Finally everything — every spoke’s diagnostics and every subscription’s Activity log — reports into the observe & cost zone: the central Log Analytics workspace, Defender plans onboarded by DINE, and per-subscription budgets keyed on the costCenter tag.

Read the five numbered badges as the control points where this most often goes wrong, and the legend narrates each as symptom · confirm · fix: a deny policy too broad (1) blocking legitimate deploys across every child sub; the wrong subscription split (2) collapsing the blast-radius boundary; egress not forced through the hub (3) when a spoke has no UDR; one-sided peering (4) that never reaches Connected; and monitoring drift (5) when DINE has no remediation. The whole method of operating a landing zone is in that left-to-right path plus those five checks — inheritance flows down, traffic is centralized and inspected, telemetry converges, and each numbered hop is a thing you can confirm with one az command.

Staged build plan

You do not build a landing zone in one giant deployment — you build it in stages, validating each before the next. Here is the plan; each stage names the deeper lesson to open if you need more than the snippet. The hands-on lab that follows builds a free-tier slice of stages 1, 3, 4, and 5 end to end.

Stage	What you build	Reuse lesson
0. Foundations	Account, Cloud Shell, CLI context	Earlier course lessons + CAF blueprint
1. Resource organization	Management groups + subscription layout	Resource organization
2. Identity	RBAC to groups, PIM for privileged roles	Identity & access
3. Networking	Hub VNet, firewall subnet, spoke peering, UDR	Network topology
4. Governance	Required-tags + deny-public-IP + DINE policy	Governance
5. Monitoring	Central Log Analytics + diagnostic settings	Governance
6. Security	Defender for Cloud plans + Zero Trust posture	Security baseline
7. Cost	Budgets + alerts per subscription	FinOps & cost engineering
8. Automation	Wrap it all in IaC + a pipeline	Policy as code

Each stage has a definition of done and the one command that proves it — a checklist you can literally tick:

Stage	Definition of done	Proof command (shape)
1	MG exists; child inherits a parent policy	`az account management-group show`
2	RBAC granted to a group at sub scope	`az role assignment list --assignee <group>`
3	Hub+spoke peered `Connected` both ways	`az network vnet peering list --query "[].peeringState"`
4	A deny policy actually blocks a bad resource	attempt create → expect `RequestDisallowedByPolicy`
5	Central LAW exists; a resource sends diagnostics	`az monitor diagnostic-settings list`
6	Defender plan enabled on the subscription	`az security pricing show -n VirtualMachines`
7	Budget with alert configured	`az consumption budget list`
8	The above deploys from a pipeline, PR-gated	pipeline run is green

Representative IaC for the core pieces

You will use a mix in real life: Bicep for Azure-native resources and tenant-scoped objects (management groups, policy), Terraform when you want one tool across clouds, and az for glue and verification. The trade-off, so you pick deliberately:

Tool	Best at	Scope strength	Weak at	Use in landing zone for
Bicep	Azure-native, tenant/MG scope	First-class MG, sub, policy	Multi-cloud	MGs, policy, platform resources
Terraform	Multi-cloud, large modules	Mature state, modules	Newest Azure features lag	Cross-cloud orgs; via AVM
`az` CLI	Glue, one-offs, verification	Imperative, scriptable	Not declarative/idempotent	Bootstrap, validation, teardown
ARM JSON	Underlying engine	What Bicep compiles to	Verbose by hand	(rarely authored directly now)

Here are representative snippets for each core piece.

Management group (Bicep, tenant scope):

targetScope = 'tenant'

resource northwind 'Microsoft.Management/managementGroups@2023-04-01' = {
  name: 'northwind'
  properties: { displayName: 'Northwind Freight' }
}

resource landingzones 'Microsoft.Management/managementGroups@2023-04-01' = {
  name: 'landingzones'
  properties: {
    displayName: 'Landing Zones'
    details: { parent: { id: northwind.id } }
  }
}

Hub VNet + firewall subnet (Terraform):

resource "azurerm_virtual_network" "hub" {
  name                = "vnet-hub-eus"
  resource_group_name = azurerm_resource_group.connectivity.name
  location            = "eastus"
  address_space       = ["10.10.0.0/16"]
}

resource "azurerm_subnet" "firewall" {
  name                 = "AzureFirewallSubnet" # exact name is mandatory
  resource_group_name  = azurerm_resource_group.connectivity.name
  virtual_network_name = azurerm_virtual_network.hub.name
  address_prefixes     = ["10.10.1.0/26"]
}

Required-tags policy assignment (Bicep, management-group scope):

targetScope = 'managementGroup'

resource requireCostCenter 'Microsoft.Authorization/policyAssignments@2024-04-01' = {
  name: 'require-tag-costcenter'
  properties: {
    displayName: 'Require costCenter tag on resources'
    // built-in: "Require a tag on resources"
    policyDefinitionId: tenantResourceId(
      'Microsoft.Authorization/policyDefinitions',
      '871b6d14-10aa-478d-b590-94f262ecfa99')
    parameters: { tagName: { value: 'costCenter' } }
    enforcementMode: 'Default'
  }
}

Forced-tunnel route table (Bicep, push spoke egress to the firewall):

resource udr 'Microsoft.Network/routeTables@2023-11-01' = {
  name: 'rt-spoke-forcedtunnel'
  location: location
  properties: {
    routes: [ {
      name: 'default-to-firewall'
      properties: {
        addressPrefix: '0.0.0.0/0'
        nextHopType: 'VirtualAppliance'
        nextHopIpAddress: '10.10.1.4' // Azure Firewall private IP
      }
    } ]
  }
}

Log Analytics workspace (az):

az monitor log-analytics workspace create \
  --resource-group rg-management \
  --workspace-name law-northwind-central \
  --location eastus \
  --sku PerGB2018 \
  --retention-time 30

Real-world scenario

Northwind Freight kicked off the landing-zone build with the four-engineer platform team and a hard deadline: the first application team — the online checkout workload — was promised an isolated, governed subscription in three weeks. The legacy estate was a single subscription, sub-legacy-allinone, holding 140 resources: untagged VMs, three orphaned public IPs, a SQL database reachable from the internet, and a monthly bill of about ₹9.2 lakh that Finance could not split by team. The CTO’s instruction was the brief verbatim: stop the wild west, let app teams move fast safely, and show the bill by team.

Week one was design and the governance plane. The team stood up the MG hierarchy (northwind → platform / landingzones / sandbox / decommissioned) and assigned three policies at northwind in DoNotEnforce first — require costCenter, require owner, allowed-locations (India regions only). The dry run immediately paid off: the compliance view showed 111 of 140 legacy resources non-compliant on tags. Had they enforced deny on day one, the legacy team’s own redeploys would have been blocked mid-flight. Instead they ran a Modify remediation to backfill costCenter from a spreadsheet, re-checked compliance, then flipped require-tags to Default. The deny-public-IP policy went onto the landingzones/corp branch only — deliberately not on online, which legitimately needed a WAF-fronted public entry point.

Week two was connectivity and the first real failure. The team built the hub (10.10.0.0/16) with AzureFirewallSubnet, Bastion, and a gateway subnet, then peered the new corp spoke (10.20.0.0/16). The corp test VM could not reach the internet at all — every outbound call timed out. The reflex was to blame the firewall rules, and an hour vanished there. The actual cause was the badge-3 failure on the diagram: they had attached the UDR forcing 0.0.0.0/0 to the firewall, but had not yet added a firewall network rule allowing the traffic, and a separate test against a second spoke revealed badge-4 — the online spoke’s peering read Initiated, not Connected, because it had been created on only one side. az network vnet peering list --query "[].peeringState" made both obvious in seconds once they stopped guessing and ran the confirm command.

Week three delivered the online team’s subscription. Because the guardrails lived at the MG, the new sub-online-prod inherited require-tags, allowed-locations, and DINE Log Analytics onboarding the moment it was created and moved under landingzones/online — zero extra configuration. The app team got Contributor on their subscription and nothing above it, deployed their spoke and App Gateway via a PR into their own repo, and were serving traffic in two days without filing a single networking ticket. DINE auto-onboarded every resource they created to the central law-northwind-central workspace, so security had estate-wide visibility from minute one. A per-subscription budget of ₹1.5 lakh with alerts at 80% and 100% gave Finance the per-team line they had asked for.

The outcome, after one quarter: 100% tag compliance on all new resources, zero public IPs in the corp branch (the policy blocked three attempts during the migration — each a VM someone tried to give a public IP “just to test”), and a Cost Management view that finally sliced the bill by costCenter. The legacy subscription was drained workload-by-workload into governed app subscriptions and moved to decommissioned. The lesson the team wrote on the wall: “Inheritance and dry-run are the whole game. Assign the guardrail once at the management group, prove it in DoNotEnforce, then enforce — and when the network breaks, run the confirm command before you touch a rule.”

The build as a timeline, because the order of moves is the lesson:

Week	Goal	Key action	Failure hit	Resolution
1	Governance plane	Policies at MG in `DoNotEnforce`	111/140 legacy untagged	Modify-remediate, then enforce
1	Tag attribution	Require costCenter/owner	Would have blocked legacy redeploys	Dry-run caught it first
2	Connectivity	Hub + corp spoke + UDR	Corp VM no internet (badge 3)	Add firewall rule for the route
2	Peering	Online spoke peering	State `Initiated` (badge 4)	Create peering both directions
3	App onboarding	New `sub-online-prod`	(none — inherited cleanly)	Guardrails applied automatically
3	Cost	Per-sub budget + alerts	—	Finance gets per-team bill
+1 qtr	Decommission legacy	Drain to governed subs	—	Move legacy to `decommissioned`

Advantages and disadvantages

The governed-landing-zone model both prevents an entire class of production problems and adds real upfront complexity. Weigh it honestly:

Advantages (why this model helps you)	Disadvantages (why it costs you)
Guardrails inherit — assign once at an MG, every current and future sub is governed	Upfront design + IaC effort before the first workload ships any value
Subscriptions are clean blast-radius and billing boundaries — a misconfig stays contained	More subscriptions = more boundaries to manage (mitigated by automation)
App teams get isolated, pre-governed environments and move fast without tickets	Platform team becomes a dependency; needs to scale with the org
Policy makes compliance a property of the platform, not a memory test	A too-broad deny can block legitimate deploys across many subs at once
Central monitoring lets security hunt across the whole estate from one workspace	Centralized telemetry can get expensive at volume without DCR scoping
Hub-spoke gives one place to inspect, log, and attach hybrid connectivity	The hub is a focal point — an extra hop and a thing that must stay up
Everything as code is reproducible, reviewable, and auditable	Steeper skill bar; the team must know Bicep/Terraform and pipelines

The model is right for any organization past a handful of subscriptions, anyone with compliance obligations, and any platform team that wants to onboard app teams repeatably. It is overkill for a single hobby subscription or a one-week pilot. The disadvantages are all manageable — they are the reason platform automation (subscription vending) and the Well-Architected Framework exist — but only if you know they exist, which is the point of building the design deliberately rather than letting it accrete.

Hands-on lab — build a free-tier landing-zone slice

You will build a real, working slice of the landing zone using the az CLI in Azure Cloud Shell — no installs. We keep it inside a single subscription so it stays free-tier-friendly (a management group, resource groups, two VNets with peering, a tagging policy, and a Log Analytics workspace cost nothing or pennies). Everything goes into resource groups you delete at the end.

Note on scope: creating real platform and application subscriptions needs an enrollment you may not have on a personal account, so the lab models the hierarchy with a management group and models platform-vs-app separation with resource groups + tags. The commands are identical in shape to the real thing.

1. Set context. Open Azure Cloud Shell, pick Bash, and confirm where you are:

az account show --output table
SUB_ID=$(az account show --query id -o tsv)
echo "Working in subscription: $SUB_ID"

2. Create a management group (stage 1). This needs no enrollment and is free:

az account management-group create \
  --name northwind-demo \
  --display-name "Northwind Freight (demo)"

Expected: JSON describing the new group. (It can take a minute to appear in the portal — that is normal.)

3. Create platform and application resource groups (modelling the subscription split), each tagged for cost attribution:

az group create -n rg-connectivity -l eastus \
  --tags costCenter=platform owner=platform-team env=shared

az group create -n rg-management -l eastus \
  --tags costCenter=platform owner=platform-team env=shared

az group create -n rg-corp-prod -l eastus \
  --tags costCenter=logistics owner=app-team env=prod

4. Build the hub VNet and a spoke, then peer them (stage 3):

# Hub network in the connectivity RG
az network vnet create -g rg-connectivity -n vnet-hub \
  --address-prefix 10.10.0.0/16 \
  --subnet-name AzureFirewallSubnet --subnet-prefix 10.10.1.0/26

# Spoke network in the corp app RG
az network vnet create -g rg-corp-prod -n vnet-corp-spoke \
  --address-prefix 10.20.0.0/16 \
  --subnet-name snet-workload --subnet-prefix 10.20.1.0/24

# Get resource IDs for peering
HUB_ID=$(az network vnet show -g rg-connectivity -n vnet-hub --query id -o tsv)
SPOKE_ID=$(az network vnet show -g rg-corp-prod -n vnet-corp-spoke --query id -o tsv)

# Peer both directions
az network vnet peering create -g rg-corp-prod -n spoke-to-hub \
  --vnet-name vnet-corp-spoke --remote-vnet "$HUB_ID" \
  --allow-vnet-access

az network vnet peering create -g rg-connectivity -n hub-to-spoke \
  --vnet-name vnet-hub --remote-vnet "$SPOKE_ID" \
  --allow-vnet-access

5. Assign a tagging guardrail (stage 4). Assign the built-in Require a tag on resources policy, scoped to the corp resource group, requiring costCenter:

az policy assignment create \
  --name require-costcenter \
  --display-name "Require costCenter tag" \
  --scope "/subscriptions/$SUB_ID/resourceGroups/rg-corp-prod" \
  --policy "871b6d14-10aa-478d-b590-94f262ecfa99" \
  --params '{ "tagName": { "value": "costCenter" } }'

6. Create the central Log Analytics workspace (stage 5):

az monitor log-analytics workspace create \
  --resource-group rg-management \
  --workspace-name law-northwind-demo \
  --location eastus \
  --sku PerGB2018 --retention-time 30

7. Validate. Prove the slice exists and is wired correctly:

# Management group present
az account management-group show --name northwind-demo -o table

# Peering shows "Connected" both ways
az network vnet peering list -g rg-corp-prod --vnet-name vnet-corp-spoke \
  --query "[].{name:name, state:peeringState}" -o table

# Policy assignment present at the corp RG
az policy assignment list \
  --scope "/subscriptions/$SUB_ID/resourceGroups/rg-corp-prod" \
  --query "[].displayName" -o table

# Workspace provisioned
az monitor log-analytics workspace show \
  -g rg-management -n law-northwind-demo \
  --query provisioningState -o tsv

Expected: the peering state reads Connected for spoke-to-hub, the policy displayName appears, and the workspace provisioningState is Succeeded. You now have, in miniature, every pillar of the landing zone: hierarchy, platform/app separation, hub-spoke, a guardrail, and central monitoring.

8. Cleanup. Delete the resource groups (this removes the VNets, peering, policy assignment scoped to the RG, and workspace), then the management group:

az group delete -n rg-corp-prod --yes --no-wait
az group delete -n rg-connectivity --yes --no-wait
az group delete -n rg-management --yes --no-wait
az account management-group delete --name northwind-demo

Cost note: Empty VNets, peering, a tagging policy, a management group, and a workspace with no ingested data are free or a few pennies; deleting the resource groups the same day keeps this comfortably in free-tier territory.

The lab steps mapped to what each proves and its real-world analogue:

Step	What you did	What it proves	Real-world analogue
2	Create a management group	Hierarchy is free and enrollment-light	The MG tree that inherits guardrails
3	Tagged platform/app RGs	Cost attribution starts with tags	Platform-vs-app subscription split
4	Hub + spoke + bidirectional peering	Peering needs both sides	Hub-spoke connectivity
5	Require-tags policy at RG scope	Governance is an assignment	Deny guardrails at MG scope
6	Central Log Analytics workspace	One place for telemetry	The estate-wide monitoring sink
7	Validate everything	Confirm commands beat guessing	The acceptance test
8	Delete the resource groups	Clean teardown is part of IaC	Decommission discipline

Common mistakes & troubleshooting

This is the playbook — the part you bookmark. These are the failures that actually bite a platform team building a landing zone. First as a scannable table you can read mid-incident, then the expanded reasoning for the ones that hurt most.

#	Symptom	Root cause	Confirm (exact cmd / portal path)	Fix
1	New MG not visible after create	Propagation delay; or you lack Management Group Contributor at root	`az account management-group list -o table`; check role at root	Wait ~1 min; ensure tenant-level MG permission is enabled (first MG op)
2	A deny policy blocks a resource you expected to succeed	Deny broader than intended (e.g. deny-public-IP catching an LB)	Read `policyEvaluationDetails` in the 403; `az policy state list --filter "complianceState eq 'NonCompliant'"`	Narrow with `notIn`/excluded scopes; re-test in `DoNotEnforce`
3	VNet peering stuck `Initiated`	Peering created on only one side	`az network vnet peering list --query "[].peeringState"`	Create the peering in both directions
4	Spoke VM has no internet after adding the firewall	UDR sends `0.0.0.0/0` to firewall, but no firewall rule allows it	`az network nic show-effective-route-table`; firewall logs show deny	Add a firewall network/application rule (or remove UDR while testing)
5	DINE policy never onboards new resources	No remediation task; remediation identity lacks the role at scope	`az policy remediation list`; check the assignment’s MI role	`az policy remediation create`; grant the MI its role
6	`az policy assignment create` fails on `--params`	JSON quoting mangled in the shell	Re-run with the JSON in a file	Pass `--params @file.json`; or use Cloud Shell
7	New subscription not governed	Sub not moved under the governed MG	`az account management-group subscription show`	Move the sub under the correct MG; policy inherits
8	RBAC change “did nothing”	Granted at the wrong scope, or to a user not the group	`az role assignment list --assignee <id> --all`	Grant to the group at MG/sub scope
9	Resource resolves to public IP despite Private Endpoint	Private DNS zone not linked to the VNet	`nslookup <host>` returns public IP	Link `privatelink.*` zone; register the A record
10	Allowed-locations policy blocks a global resource	Global resources report `location: global`	Policy error names the location	Exclude global resource types / use the built-in’s location list
11	Budget alert never fires	Threshold/contact misconfigured, or spend genuinely under	`az consumption budget show`; check action group	Fix threshold/contact; budgets evaluate on a delay
12	Can’t assign Owner to the pipeline	Contributor can’t grant RBAC (by design)	Pipeline identity is Contributor, not UAA/Owner	Use a PIM-eligible Owner or a scoped UAA for the bootstrap

The expanded form, with the full reasoning for the entries that bite hardest:

2. A deny policy blocks a resource you expected to succeed. Root cause: The deny is broader than intended — deny-public-IP catching a Load Balancer frontend, or allowed-locations catching a global resource — and because it’s assigned at an MG it blocks across every child subscription at once. Confirm: Read policyEvaluationDetails in the deployment error (it names the assignment and definition); or az policy state list --filter "complianceState eq 'NonCompliant'" to see what’s tripping. Fix: Decide whether the policy is correct (the resource genuinely violates intent) or too broad. If too broad, narrow it with parameters/exclusions (notIn, excluded scopes) and validate in DoNotEnforce before re-enabling. Avoid blanket exemptions — they erode the guardrail.

3. VNet peering stuck in Initiated. Root cause: Peering is directional — a link created on the spoke alone leaves the relationship half-built, so traffic never flows. Confirm: az network vnet peering list -g <rg> --vnet-name <vnet> --query "[].{name:name,state:peeringState}" shows Initiated, not Connected. Fix: Create the peering in both directions (hub→spoke and spoke→hub). Both links must exist for the state to become Connected.

4. Spoke VM cannot reach the internet after you add the firewall. Root cause: The UDR default route (0.0.0.0/0 → firewall private IP) is attached and forces all egress to the firewall, but the firewall has no rule allowing that traffic, so it’s dropped — exactly the badge-3 control point. Confirm: az network nic show-effective-route-table --name <nic> -g <rg> shows next hop = the firewall; the firewall’s logs (in the central LAW) show the connection denied. Fix: Add an Azure Firewall network or application rule permitting the required egress, or remove the UDR while you isolate connectivity. See Azure Firewall forced tunneling & hub-spoke routing.

5. DINE policy never onboards new resources. Root cause: A DeployIfNotExists policy needs a managed identity with the right role at the right scope and a remediation task to act on existing resources — miss either and nothing happens. Confirm: az policy remediation list is empty; the assignment’s managed identity has no role assignment at the target scope. Fix: Trigger remediation (az policy remediation create) and grant the assignment’s managed identity the role it needs (e.g. Log Analytics Contributor). New resources are remediated automatically; existing ones need the explicit task.

7. A new subscription isn’t governed. Root cause: Creating a subscription does not place it under a governed MG — until you move it, it inherits nothing. Confirm: az account management-group subscription show --name <mg> --subscription <sub> doesn’t list it; the sub shows no inherited policy. Fix: Move the subscription under the correct MG (az account management-group subscription add); inheritance applies immediately. This is what subscription vending automates.

Best practices

Decide before you deploy. Write the design decisions down (the seven above) and review them with stakeholders. IaC is cheap to change; an undocumented hierarchy is not.
Inherit, don’t repeat. Assign policy and RBAC at the highest sensible scope (management group) so new subscriptions are governed automatically.
Subscriptions are cheap; use them as boundaries. One per workload-or-environment beats cramming everything into one giant subscription.
Dry-run governance. New initiatives go in DoNotEnforce, you read compliance, then you enforce. Never flip a deny on blind.
Confirm before you fix. When the network breaks, run the confirm command (peering list, show-effective-route-table) before touching a rule — most landing-zone “firewall bugs” are peering or routing.
Everything is code. The platform team owns the hierarchy/policy/hub in one repo; app teams own their spokes via PR. Deploy through a pipeline, never the portal, for anything that must be reproducible. See Azure Policy as code.
Grant to groups, scope high, activate JIT. Group-based RBAC at MG/sub scope plus PIM for privileged roles is auditable and leaves no standing admin.
Tag from day one. costCenter, owner, env are not optional — they are what makes cost, ownership, and cleanup possible.
Centralize telemetry, scope ingestion. One workspace for cross-estate hunting, but use DCRs to control what each resource sends so the bill stays sane.
Force egress through the hub. A UDR sending 0.0.0.0/0 to the firewall plus a matching firewall rule means all traffic is inspected and logged in one place.
Quarantine before you delete. Move retired subscriptions to a decommissioned MG with deny-create before deletion, so nothing new lands in them.
Reuse the deep-dive lessons. Each pillar has a production-depth lesson; this capstone is the map, those are the territory.

Security notes

The landing zone is your security baseline, so treat it that way. Grant RBAC to groups, scoped high, least-privilege; make privileged roles eligible via PIM, not standing. Keep corp workloads private by policy (deny public IPs; reach them through Bastion or the firewall, never a public NIC). Force all egress through the hub firewall with a UDR so traffic is inspected and logged in one place. Turn on Microsoft Defender for Cloud and enforce its onboarding with DINE policy so coverage cannot drift. Funnel every diagnostic and Activity Log into the central Log Analytics workspace so security can hunt across the whole estate. And never embed secrets in IaC — use a pipeline identity with workload identity federation rather than a long-lived service-principal secret. This is the Zero Trust multilayer model applied to the platform itself; deepen it with Azure landing zone — security.

The security controls the landing zone bakes in, what each defends against, and the policy/mechanism that enforces it:

Control	Mechanism	Defends against	Enforced by
No standing admin	PIM-eligible Owner/UAA	Stolen-token lateral movement	PIM + access reviews
Group, least-privilege RBAC	Contributor at sub scope	Privilege sprawl/escalation	RBAC plan + audit
Private corp workloads	Deny-public-IP policy	Internet-exposed internal VMs	Azure Policy (deny)
Inspected, logged egress	Firewall + forced-tunnel UDR	Exfiltration, blind traffic	UDR + firewall rules
Secretless config	Managed identity + KV references	Secrets in plaintext/IaC	Key Vault + MI
No-drift monitoring	DINE LAW + Defender onboarding	Coverage gaps	Azure Policy (DINE)
Data in approved geos	Allowed-locations policy	Sovereignty/compliance breach	Azure Policy (deny)
Reproducible, reviewed changes	IaC + PR-gated pipeline	Unaudited portal drift	CI/CD + branch policy

Cost & sizing

What drives the landing-zone bill is not the governance scaffolding — management groups, policy, peering, and budgets are free. The cost is the shared platform services you run continuously (firewall, gateways, Bastion) plus telemetry ingestion. Right-sizing is mostly about whether you actually need each shared service yet, and scoping what you log. Rough INR figures (production-grade, vary by region and usage):

Component	What you pay for	Rough INR / month	Free-tier note
Management groups + policy	Nothing	₹0	Always free
VNet peering	Per-GB transferred (intra-region small)	₹0–low	Empty peering is free
Azure Firewall (Standard)	Hourly + per-GB processed	~₹35,000–45,000	No free tier; the big platform cost
Azure Bastion (Basic)	Hourly	~₹12,000–14,000	No free tier; can deallocate
VPN Gateway (VpnGw1)	Hourly	~₹12,000–15,000	Only if hybrid is needed
Log Analytics ingestion	Per-GB ingested + retention	~₹20,000/100GB-mo	5 GB/mo free-ish; scope with DCRs
Defender for Cloud	Per-resource per plan	varies by estate	Free CSPM tier; paid plans per resource
Budgets / Cost Management	Nothing	₹0	Always free

The right-sizing decisions that actually move the bill:

Decision	Cheaper choice	When it’s safe	Trade-off
Firewall vs NSG-only egress	NSG + NAT Gateway	Small estate, no L7 inspection need	Lose central app-layer inspection
Bastion always-on vs on-demand	Deallocate when idle	Dev/test, rare access	Reconnect delay
Per-GB LAW vs commitment tier	Commitment tier	Sustained > 100 GB/day	Pay for committed volume
Log everything vs DCR-scoped	DCR-scoped	Always	Less data if you under-scope
Standard vs Premium Firewall	Standard	No IDPS/TLS-inspection need	Lose IDPS/TLS inspection
Run all platform services now	Add as needed	Always — start minimal	Retro-fit effort later

For the full discipline — reservations, savings plans, hybrid benefit, anomaly alerts — see Azure FinOps & cost engineering and reservations & savings-plan strategy.

Interview & exam questions

1. Walk me through how you would design an Azure landing zone for a company moving off a single subscription. Start from the business drivers, then the eight CAF design areas. Adopt a management-group hierarchy (platform / landing zones / sandbox / decommissioned) for inherited policy and RBAC; split subscriptions by responsibility (platform vs application) so each workload is its own blast-radius and billing boundary; centralize networking in a hub with peered spokes and forced-tunnel egress; grant least-privilege RBAC to groups with PIM for privileged roles; enforce tagging and deny risky resources via policy at MG scope; funnel all telemetry into one Log Analytics workspace; and add per-subscription budgets. Deliver it all as IaC through a PR-gated pipeline.

2. Why management groups instead of just applying policy per subscription? Inheritance and scale. One assignment at an MG flows to every current and future subscription beneath it, so governance is automatic for new teams and you have a single place to change a guardrail — versus drift and toil when each subscription is configured by hand.

3. A deny policy is blocking a legitimate deployment. How do you debug it? Read policyEvaluationDetails in the error to find which assignment and definition denied it. Decide whether the policy is correct (the resource genuinely violates intent) or too broad. If too broad, narrow it with parameters/exclusions (notIn, excluded scopes) and validate in DoNotEnforce before re-enabling. Avoid blanket exemptions — they erode the guardrail.

4. Platform team owns the hierarchy and hub; app teams own workloads. How do you structure that so app teams move fast without breaking governance? Two ownership layers in IaC: the platform repo owns MGs, policy, and the hub; app teams own their spokes and workloads and ship via PR into their own subscription, where guardrails already inherit. App teams get Contributor on their subscription and no rights above it, so they cannot weaken platform policy. This is “subscription democratization.”

5. How do you keep monitoring from drifting as teams add resources? Enforce it with DeployIfNotExists policy that auto-onboards new resources to the central Log Analytics workspace and Defender for Cloud, with remediation tasks for existing ones. Coverage becomes a property of the platform, not something a team must remember.

6. How does this design answer “show me the bill by team”? Subscriptions are the billing boundary (one per workload/env), and a require-tags policy guarantees every resource carries costCenter/owner/env. Cost Management then slices spend by subscription and tag, and per-subscription budgets with alerts warn owners before they overrun.

7. What forces a spoke’s outbound traffic through the hub firewall, and what’s the classic failure? A route table (UDR) whose default route (0.0.0.0/0) has next hop = the firewall’s private IP, attached to the spoke subnet (“forced tunneling”). The classic failure is attaching the UDR but forgetting a firewall rule to allow the traffic — egress is then dropped, looking like a connectivity bug. Confirm with show-effective-route-table and the firewall logs.

8. Difference between Deny, Audit, and DeployIfNotExists policy effects? Deny rejects a non-compliant create/update at evaluation time (a hard guardrail). Audit flags non-compliance but allows it (visibility without blocking). DeployIfNotExists deploys a related resource when it’s missing (e.g. Log Analytics onboarding) and needs a managed identity plus a remediation task to act on existing resources.

9. Why grant RBAC to groups and scope it high rather than to individuals per-resource? Group-based, high-scope, least-privilege RBAC is auditable and manageable: you add/remove a person from a group instead of hunting per-resource assignments, you avoid privilege creep, and combined with PIM you avoid standing admin rights.

10. What is PIM and which roles do you put behind it? Privileged Identity Management makes roles eligible rather than active, so a human activates them just-in-time with MFA, justification, and (for the top roles) approval, for a bounded duration. Put Owner and User Access Administrator behind it always; consider it for any role that can change access or delete platform resources.

11. Your VNet peering shows Initiated, not Connected. What’s wrong? Peering is directional and was created on only one side. Create the peering in both directions (hub→spoke and spoke→hub); both links must exist before the state becomes Connected.

12. How do you onboard a new application team without a networking ticket? Vend them a subscription already under the governed landingzones MG (so it inherits guardrails), grant their group Contributor on just that subscription, and let them deploy their spoke + workload via a PR into their own repo. Because policy, DINE monitoring, and the peering pattern are platform-owned and inherited, the team is productive in days with no manual networking step.

These map most directly to AZ-305: Designing Microsoft Azure Infrastructure Solutions, with reinforcement of AZ-104 (Administrator). The cert mapping for revision:

Question theme	Primary cert	Exam objective area
MG hierarchy, governance strategy	AZ-305	Design governance & identity
Policy effects, require-tags, deny	AZ-305 / AZ-104	Design/implement governance
Hub-spoke, peering, forced tunnel	AZ-305 / AZ-104	Design/implement networking
RBAC strategy, PIM	AZ-305 / AZ-500	Design identity & access
Central monitoring, DINE onboarding	AZ-305 / AZ-104	Design/implement monitoring
Budgets, tags, cost attribution	AZ-305	Recommend cost solutions
IaC, pipelines, reproducibility	AZ-305 / AZ-400	Design platform automation

Quick check

Why assign policy and RBAC at a management group rather than on each subscription?
What is the difference between a platform subscription and an application (landing-zone) subscription?
In hub-spoke, what forces a spoke’s outbound traffic through the hub firewall — and what’s the one thing people forget to add alongside it?
Your new spoke peering reads Initiated. What did you miss, and what command confirms it?
Why dry-run a new policy initiative in DoNotEnforce mode before enforcing it?

Answers

Because management groups inherit — a single assignment flows to every current and future subscription beneath them, so new teams are governed automatically instead of someone remembering to re-apply guardrails each time.
Platform subscriptions hold shared services (connectivity, management, identity) owned by the platform team and changing rarely; application subscriptions are handed one-per-workload to app teams and are the blast-radius/billing boundary for that workload.
A route table (UDR) whose default route (0.0.0.0/0) has next hop = the firewall’s private IP, attached to the spoke subnet (forced tunneling). People forget the matching firewall rule to allow that egress, so traffic is dropped and it looks like a connectivity bug — confirm with az network nic show-effective-route-table.
You created the peering on only one side; peering is directional. Create it in both directions, and confirm with az network vnet peering list --query "[].peeringState" — both must read Connected.
DoNotEnforce evaluates compliance without blocking anything, so you can read what would be denied and fix scope or exclusions before a too-broad deny breaks legitimate deployments (including your own platform bootstrap).

Exercise

Extend the lab into the cost pillar. Using the rg-corp-prod resource group from the lab (or recreate it), create a budget with an alert so an owner is notified before spend exceeds a threshold:

az consumption budget create \
  --budget-name corp-prod-monthly \
  --amount 10 \
  --category Cost \
  --time-grain Monthly \
  --start-date 2026-06-01 --end-date 2026-12-31 \
  --resource-group rg-corp-prod

Then answer in two or three sentences: how does requiring the costCenter tag (from the lab) combine with this budget to satisfy Northwind’s “show me the bill, by team” requirement? Clean up afterward.

Capstone deliverables & self-assessment rubric

To call the capstone “done,” produce these deliverables:

A design document stating the seven decisions and their justification.
The architecture diagram of your target state (you have a template above).
IaC (Bicep and/or Terraform) for management group(s), resource groups, hub + spoke networking, at least one policy assignment, and a Log Analytics workspace.
A short acceptance test (the validation commands) that proves the build and a clean teardown.

Acceptance criteria — the build passes if all are true:

Management-group hierarchy deployed; a policy assigned at a parent scope is visible at a child scope.
Platform vs application separation exists (subscriptions in production; resource groups + tags in the lab).
Hub and spoke VNets peer with state Connected both directions.
At least one deny/require policy is assigned and a non-compliant resource is actually blocked or flagged.
A central Log Analytics workspace exists and at least one resource sends diagnostics to it.
Every resource carries costCenter, owner, and env tags; a budget alert is configured.
All RBAC is granted to groups at subscription/MG scope; no standing Owner on individuals.

Self-assessment rubric — grade each area 0–3 and aim for 2+ everywhere before you consider yourself “hero” level:

Area	0 — Not done	1 — Started	2 — Solid	3 — Production-grade
Resource organization	Flat / ad-hoc	MGs exist, no plan	CAF hierarchy, platform/app split	Subscription vending automated
Networking	Single flat VNet	Hub + spoke exist	Peered + UDR egress	Firewall rules, DNS, Bastion, hybrid
Identity	Per-user Owner	Some groups used	Group RBAC, scoped high	PIM JIT, least privilege everywhere
Governance	No policy	A few audits	Required-tags + deny assigned	DINE auto-remediation, shipped as code
Monitoring	Nothing central	Workspace exists	Diagnostics flow in	Defender + alerts + workbooks
Security	Defaults	Some hardening	Private corp + Defender on	Zero Trust, secretless, reviewed
Cost	No tags/budgets	Tags inconsistent	Tags + budgets + alerts	Per-team chargeback, anomaly alerts
Automation	Portal-built	Some scripts	Bicep/Terraform for all	Pipeline-deployed, PR-gated

Glossary

Landing zone — a pre-provisioned, governed Azure environment (networking, identity, policy, monitoring) that workloads “land” in.
Cloud Adoption Framework (CAF) — Microsoft’s guidance whose eight design areas structure a landing zone.
Management group — a container above subscriptions for applying policy and RBAC that inherit downward.
Platform subscription — a subscription for shared services (connectivity, management, identity) owned by the platform team.
Application (landing-zone) subscription — a subscription handed to an app team for one workload/environment; the blast-radius and billing boundary.
Hub-spoke — a network topology with shared services in a central hub VNet and workloads in peered spoke VNets.
UDR (route table) — a user-defined route that overrides default routing, used to force spoke egress through the hub firewall.
Forced tunneling — sending a subnet’s 0.0.0.0/0 default route to the firewall so all egress is inspected.
VNet peering — a directional link connecting two VNets; both directions must exist for state Connected.
Azure Policy initiative — a grouped set of policy definitions assigned together at a scope to enforce or audit rules.
Policy effect — what a policy does at evaluation (Deny, Audit, Append, Modify, DeployIfNotExists, AuditIfNotExists, Disabled, DenyAction).
DeployIfNotExists (DINE) — a policy effect that auto-deploys a required configuration (e.g. Log Analytics onboarding) when missing.
enforcementMode — whether a policy assignment actually blocks/remediates (Default) or only evaluates (DoNotEnforce).
RBAC role assignment — the binding of a principal to a role at a scope (MG, subscription, RG, or resource).
PIM — Privileged Identity Management; makes privileged roles eligible/just-in-time rather than always-on.
Log Analytics workspace — the central store telemetry is funnelled into so security can query the whole estate.
Budget — a Cost Management spend cap with threshold alerts, set per subscription/RG/tag.
Acceptance criteria — the explicit, testable conditions a build must satisfy to be considered done.

Next steps

Congratulations — that is the Azure Zero-to-Hero capstone. The natural next lesson is the course finale on getting hired and certified: Azure Interview & Certification Prep: Scenarios + AZ-104/AZ-305 Roadmap.

To take any single pillar from this capstone to full production depth, build on the KloudVin landing-zone series:

Designing an Azure Landing Zone with the Cloud Adoption Framework — the end-to-end blueprint and the eight design areas.
Azure landing zone — resource organization — management groups and subscription strategy in depth.
Azure landing zone — network topology — hub-spoke, firewall, DNS, and connectivity.
Azure landing zone — identity & access — RBAC, PIM, and the identity baseline.
Azure landing zone — governance — policy, monitoring, and compliance.
Azure landing zone — security — Defender for Cloud and the Zero Trust posture.
Azure Policy as code — shipping all of the above through a CI/CD pipeline.
Subscription vending & platform automation — automate handing governed subscriptions to app teams.