Defender for Cloud Attack Path Analysis: Custom Recommendations and Governance Rules

Secure Score tells you that 412 resources are non-compliant. It does not tell you that exactly three of them form a chain from a public load balancer, to a VM with a stale Owner role assignment, to a storage account flagged as holding sensitive data. That chain is the thing an attacker actually walks. This article operationalizes the cloud security graph in Defender for Cloud: reading attack paths, encoding your own risk logic as custom recommendations, then forcing remediation through governance rules with named owners and real SLAs. The goal is a posture program where the number that goes down is reachable, exploitable exposure, not a generic control count.

Everything that follows requires the Defender CSPM plan, not foundational CSPM. The graph, attack paths, the cloud security explorer, and KQL-based custom recommendations are all gated behind it.

# Defender CSPM is the plan named "CloudPosture" in the pricing API
az security pricing create --name CloudPosture --tier Standard

# Agentless scanning feeds the graph with vuln/secret/sensitive-data context
az security pricing list --query "value[?name=='CloudPosture'].{plan:name,tier:pricingTier}" -o table

1. How the cloud security graph models exposure

The graph is a context engine. It ingests asset inventory, network reachability, identity and permission relationships, vulnerability findings (from agentless scanning), and data sensitivity (from sensitive-data discovery in Defender CSPM), then connects them as a typed graph of nodes and edges. A node is an entity: a VM, a managed identity, a storage account, an IP address. An edge is a relationship an attacker could traverse: “is exposed to the internet”, “can authenticate as”, “can read data from”, “has permission to”.

This is the leap past Secure Score. A misconfiguration in isolation is a finding. The same misconfiguration on a node one hop from an internet-facing entity and two hops from a sensitive data store is a path. Defender for Cloud computes those paths and surfaces them as attack paths, scored by how much they actually matter.

The mental model: Secure Score answers “what is wrong?” The graph answers “what is reachable, and what does it reach?” You operationalize the second question, because that is where breach risk lives.

Two surfaces sit on top of the same graph:

Cloud security explorer (in the Defender for Cloud portal) is a visual, dropdown-driven query builder over a daily snapshot. Good for ad-hoc “show me every internet-exposed VM with a high-severity CVE” hunts and for sharing a query link with a teammate.
Advanced hunting in the Microsoft Defender portal exposes the same exposure graph as the ExposureGraphNodes and ExposureGraphEdges tables, queryable with full KQL including the make-graph and graph-match graph operators. This is where you write reproducible, parameterized path queries and wire them into automation.

2. Reading an attack path: internet to identity to data

Start in the portal under Attack path analysis. Each path reads left to right from an entry point (typically internet exposure) through intermediate hops to a target (sensitive data, a privileged identity, a critical workload). Defender for Cloud groups paths by pattern, for example “Internet exposed VM has high severity vulnerabilities and read permission to a data store.”

To reproduce and extend that logic yourself, drop into advanced hunting. The exposure graph schema is the foundation. ExposureGraphNodes carries NodeId, NodeLabel, Categories, EntityIds, and a dynamic NodeProperties; ExposureGraphEdges carries SourceNodeId, TargetNodeId, EdgeLabel, and the source/target node labels. Before writing a path query, learn the vocabulary of your own tenant:

// What entity types exist?
ExposureGraphNodes
| summarize Count = count() by NodeLabel
| order by Count desc

// What relationships connect them?
ExposureGraphEdges
| summarize Count = count() by EdgeLabel
| order by Count desc

The single most useful starting query is the canonical internet-to-RCE check. Note that exposure flags live under NodeProperties.rawData and category membership is tested with set_has_element against the Categories array:

// Internet-exposed VMs that are also vulnerable to remote code execution.
// Works across Azure, AWS, and GCP once those connectors feed the graph.
ExposureGraphNodes
| where isnotnull(NodeProperties.rawData.exposedToInternet)
| where isnotnull(NodeProperties.rawData.vulnerableToRCE)
| where set_has_element(Categories, "virtual_machine")

The real value is multi-hop traversal. Load the edges into a graph with make-graph, then match a pattern with graph-match. This query finds any path of up to three hops from an IP address to a virtual machine, which is the skeleton of an exposure-to-asset path:

let IPsAndVMs = ExposureGraphNodes
| where set_has_element(Categories, "ip_address") or set_has_element(Categories, "virtual_machine");
ExposureGraphEdges
| make-graph SourceNodeId --> TargetNodeId with IPsAndVMs on NodeId
| graph-match (IP)-[anyEdge*1..3]->(VM)
    where set_has_element(IP.Categories, "ip_address")
      and set_has_element(VM.Categories, "virtual_machine")
    project IpIds = IP.EntityIds, VmIds = VM.EntityIds, VmProps = VM.NodeProperties.rawData

To express the “lands on an identity that reaches data” half, pivot on identity edges. The EdgeLabel values are explicit and matchable, for example Can Authenticate As and CanRemoteInteractiveLogonTo. Filtering edges before make-graph keeps the graph small and the query fast:

let Nodes = ExposureGraphNodes
| where set_has_element(Categories, "identity")
   or (set_has_element(Categories, "device") and isnotnull(NodeProperties.rawData.criticalityLevel));
ExposureGraphEdges
| where EdgeLabel == "Can Authenticate As"
| make-graph SourceNodeId --> TargetNodeId with Nodes on NodeId
| graph-match (Device)-[canAuthAs]->(Identity)
    where set_has_element(Identity.Categories, "identity")
      and set_has_element(Device.Categories, "device")
    project IdentityIds = Identity.EntityIds, DeviceIds = Device.EntityIds

The discipline that matters at principal level: pre-filter nodes and edges to the minimum relevant set before make-graph. The NodeProperties column is large; graphing the whole tenant unfiltered hits memory limits and timeouts. Narrow Categories and EdgeLabel up front, project only the columns you need, then traverse.

3. Writing custom recommendations with KQL

Built-in recommendations cover the Microsoft Cloud Security Benchmark. Your organization always has rules the benchmark does not, for example “every internet-facing data store must have a private endpoint” or “no VM may be missing the Owner tag.” Defender CSPM lets you encode these as custom recommendations in KQL, evaluated across Azure, AWS, and GCP.

The data source is the RawEntityMetadata table, not Azure Resource Graph. Your query filters to a resource type, evaluates a condition, and stamps each in-scope resource with a HealthStatus. The contract is strict: return exactly seven columns and every resource in scope, marking compliant ones HEALTHY and non-compliant ones UNHEALTHY. Omitting a resource reads as “no data,” not as healthy.

// Custom recommendation: Key Vaults must have purge protection enabled.
RawEntityMetadata
| where Environment == 'Azure' and Identifiers.Type =~ 'Microsoft.KeyVault/vaults'
| extend condition = (Record.properties.enablePurgeProtection != true
                      or isnull(Record.properties.enablePurgeProtection))
| extend HealthStatus = iff(condition, 'UNHEALTHY', 'HEALTHY')
| project Id, Name, Environment, Identifiers, AdditionalData, Record, HealthStatus

The pattern is mechanical and worth internalizing, because everything else is a variation on it:

Column	Type	Role
`Id`	string	Resource identifier Defender uses to map the finding
`Name`	string	Display name in the recommendation UI
`Environment`	string	`Azure`, `AWS`, or `GCP`
`Identifiers`	dynamic	Resource type + identifiers, passed through from the record
`AdditionalData`	dynamic	Supplementary metadata, passed through
`Record`	dynamic	Full resource record; properties live under `Record.properties.*`
`HealthStatus`	string	`UNHEALTHY` or `HEALTHY` only (case-sensitive)

Two rules that will save you a debugging session: properties are reached via Record.properties.* (never bare properties.*), and resource-type comparisons use the case-insensitive =~ operator. To port a check to AWS or GCP, change Environment and swap Identifiers.Type for the corresponding type string.

Author and test in the portal first. Go to Environment settings > the subscription > Security policies > Create custom recommendation, set name/scope/severity/security issue, then Open query editor, paste, and Run query to validate the schema before saving. You need Security Admin to create the recommendation and Owner on the subscription to create the custom standard it gets assigned to. Once validated, you can deploy the same recommendation at scale through the Defender for Cloud API rather than clicking through every subscription.

4. Prioritizing with attack-path and exposure-based scoring

A flat list of 400 unhealthy resources is noise. The graph lets you rank by exposure so remediation effort lands where it removes the most risk. Two levers:

Resource criticality. Tag your crown-jewel assets so the graph weights paths that reach them. Critical assets show up as elevated criticalityLevel in NodeProperties, and attack paths terminating on them are scored higher. Set this deliberately for production data stores, domain controllers, and identity providers; do not let everything be critical, or nothing is.

Choke-point analysis. A single node that appears in many paths is a choke point, the highest-leverage fix in the estate. Remediating one over-permissioned managed identity can collapse dozens of paths at once. Approximate choke-point ranking in KQL by counting how often a node is an intermediate hop:

// Rank intermediate nodes by how many internet-to-VM paths traverse them.
let Scope = ExposureGraphNodes
| where set_has_element(Categories, "ip_address")
   or set_has_element(Categories, "identity")
   or set_has_element(Categories, "virtual_machine");
ExposureGraphEdges
| make-graph SourceNodeId --> TargetNodeId with Scope on NodeId
| graph-match (IP)-[e1]->(Mid)-[e2]->(VM)
    where set_has_element(IP.Categories, "ip_address")
      and set_has_element(Mid.Categories, "identity")
      and set_has_element(VM.Categories, "virtual_machine")
    project MidName = Mid.NodeName, MidId = Mid.NodeId
| summarize PathsThrough = count() by MidName, MidId
| order by PathsThrough desc

The prioritization rule I give teams: fix choke points first (one change, many paths removed), then paths that reach critical assets, then everything else on its normal SLA. Secure Score is a lagging scoreboard; attack-path reduction is the leading metric.

5. Governance rules: owners and remediation SLAs at scale

Finding risk is half the job. Governance rules assign each recommendation an owner and a remediation timeframe (the SLA), with weekly email nudges to owners and their managers, plus an optional grace period so resources do not tank Secure Score until they are actually overdue. This turns a dashboard into an accountable program.

Create a rule with the Defender for Cloud REST API (resource type Microsoft.Security/governanceRules). It targets recommendations by their assessment keys, sets the owner, and sets the SLA. The remediationTimeframe uses a d.hh:mm:ss duration string; valid windows are 7, 14, 30, or 90 days:

az rest --method put \
  --url "https://management.azure.com/subscriptions/<SUB_ID>/providers/Microsoft.Security/governanceRules/critical-internet-exposure?api-version=2022-01-01-preview" \
  --body '{
    "properties": {
      "displayName": "Critical internet-exposure recommendations",
      "description": "Owner-assigned SLA for high-severity exposure findings",
      "rulePriority": 200,
      "isDisabled": false,
      "ruleType": "Integrated",
      "sourceResourceType": "Assessments",
      "isGracePeriod": true,
      "remediationTimeframe": "7.00:00:00",
      "ownerSource": { "type": "Manually", "value": "cloudsec-team@contoso.com" },
      "governanceEmailNotification": {
        "disableManagerEmailNotification": false,
        "disableOwnerEmailNotification": false
      },
      "conditionSets": [
        {
          "conditions": [
            {
              "operator": "In",
              "property": "$.AssessmentKey",
              "value": "[\"b1cd27e0-4ecc-4246-939f-49c426d9d72f\", \"fe83f80b-073d-4ccf-93d9-6797eb870201\"]"
            }
          ]
        }
      ]
    }
  }'

Two design choices make this scale instead of becoming a maintenance burden:

Derive the owner from a tag, not a hard-coded address. Set ownerSource.type to ByTag and ownerSource.value to the tag key (for example Owner). The rule then routes each finding to whoever owns that resource, so you do not re-edit the rule every reorg.
Apply rules at the management-group scope. Swap the subscriptions/<SUB_ID> segment for providers/Microsoft.Management/managementGroups/<MG>. New subscriptions inherit the governance regime automatically instead of joining the estate ungoverned. Use excludedScopes to carve out sandbox subscriptions.

// ownerSource by tag, so ownership tracks the resource, not the rule
"ownerSource": { "type": "ByTag", "value": "Owner" }

Tier your SLAs to severity: a 7-day window on the critical-exposure rule, 30 days on medium-severity hygiene, 90 days on low. Set isGracePeriod: true on the longer windows so hygiene work does not distort Secure Score before it is genuinely late.

6. Multicloud: AWS and GCP, normalized

The same machinery spans clouds. Connect AWS accounts and GCP projects as security connectors (Microsoft.Security/securityConnectors); once Defender CSPM is enabled on the connector, AWS and GCP assets flow into the same graph and attack paths cross cloud boundaries. Custom recommendations target them by switching Environment to 'AWS' or 'GCP' and matching the native resource type:

// Same control, GCP storage buckets
RawEntityMetadata
| where Environment == 'GCP' and Identifiers.Type =~ 'storage.googleapis.com/Bucket'
| extend condition = (Record.iamConfiguration.uniformBucketLevelAccess.enabled != true)
| extend HealthStatus = iff(condition, 'UNHEALTHY', 'HEALTHY')
| project Id, Name, Environment, Identifiers, AdditionalData, Record, HealthStatus

Governance rules attach to connector scopes too, so an AWS account or GCP project gets the same owner-and-SLA treatment as an Azure subscription:

az rest --method put \
  --url "https://management.azure.com/subscriptions/<SUB_ID>/resourceGroups/<RG>/providers/Microsoft.Security/securityConnectors/<GCP_CONNECTOR>/providers/Microsoft.Security/governanceRules/gcp-critical?api-version=2022-01-01-preview" \
  --body '{
    "properties": {
      "displayName": "GCP critical recommendations",
      "rulePriority": 210,
      "ruleType": "Integrated",
      "sourceResourceType": "Assessments",
      "isGracePeriod": true,
      "remediationTimeframe": "14.00:00:00",
      "ownerSource": { "type": "ByTag", "value": "owner" },
      "governanceEmailNotification": {
        "disableManagerEmailNotification": false,
        "disableOwnerEmailNotification": false
      },
      "conditionSets": [
        { "conditions": [ { "operator": "In", "property": "$.AssessmentKey",
          "value": "[\"b1cd27e0-4ecc-4246-939f-49c426d9d72f\"]" } ] }
      ]
    }
  }'

7. Wiring findings into ticketing and exposure-reduction workflows

Email nudges work for engaged owners; they do not scale to a backlog. Push attack-path findings into the system of record where work actually gets tracked.

ServiceNow. Governance rules support a ruleType of ServiceNow so overdue findings open tickets in your ITSM instance rather than only emailing owners. This is the cleanest path if ServiceNow is your remediation system of record.
Workflow automation. Defender for Cloud’s workflow automation triggers a Logic App on new recommendations or alerts. From there, create Jira issues, post to Teams, or call an internal API. The Logic App receives the recommendation payload including resource ID and severity.
Continuous export. Stream recommendations and secure-score changes to a Log Analytics workspace or Event Hub for a custom exposure pipeline:

az security automation create \
  --name export-recommendations \
  --resource-group security-rg \
  --location eastus \
  --scopes '[{"description":"sub","scopePath":"/subscriptions/<SUB_ID>"}]' \
  --sources '[{"eventSource":"Assessments","ruleSets":[]}]' \
  --actions '[{"actionType":"Workspace","workspaceResourceId":"/subscriptions/<SUB_ID>/resourceGroups/security-rg/providers/Microsoft.OperationalInsights/workspaces/<WS>"}]'

The architecture I recommend: attack paths and high-severity recommendations route to tickets automatically with the governance SLA as the ticket due date, while everything else stays in the Defender backlog under its grace period. You want engineers working tickets in their normal queue, not logging into a security portal they will forget exists.

8. Tracking remediation velocity and exposure reduction

Measure two things: are findings closing inside SLA (velocity), and is total reachable exposure shrinking (outcome). The governance status API reports per-rule progress, including overdue counts:

# Per-subscription governance progress (open vs. overdue per owner)
az rest --method post \
  --url "https://management.azure.com/subscriptions/<SUB_ID>/providers/Microsoft.Security/governanceRules/default/ruleIdExecuteSingleSubscription?api-version=2022-01-01-preview"

# Trend Secure Score over time as the lagging outcome metric
az security secure-scores list \
  --query "value[].{name:displayName, current:score.current, max:score.max, pct:score.percentage}" -o table

For the leading metric, track attack-path count by pattern week over week from the exposure graph, segmented by severity and by whether the path reaches a critical asset. A program that is working shows a falling path count and a rising on-time remediation rate at the same time. If Secure Score climbs but path count is flat, you are remediating cheap findings and ignoring the reachable ones, the exact failure mode the graph exists to prevent.

Enterprise scenario

A retail platform team ran Defender for Cloud across ~140 Azure subscriptions plus two AWS accounts behind a single management group. Secure Score sat around 68% and had been “improving” for a quarter, yet a red-team engagement walked a path nobody had triaged: a public Application Gateway, to an AKS-hosted VM with a high-severity CVE, to a kubelet-assigned managed identity that held Storage Blob Data Reader on a bucket the sensitive-data scanner had flagged as holding PII. Three findings, each individually low-priority in the flat list, formed one critical exposure path.

The constraint: 140 subscriptions, dozens of owning teams, and no central authority to manually chase remediation. Telling teams to “go fix things” had already failed for a quarter.

The fix had three moves. First, they made the path queryable and continuous, lifting the red-team’s chain into an advanced-hunting graph query so it would resurface the instant a similar shape reappeared.

let Scope = ExposureGraphNodes
| where set_has_element(Categories, "ip_address")
   or set_has_element(Categories, "identity")
   or set_has_element(Categories, "virtual_machine")
   or NodeProperties has "containsSensitiveData";
ExposureGraphEdges
| make-graph SourceNodeId --> TargetNodeId with Scope on NodeId
| graph-match (IP)-[e1*1..2]->(VM)-[e2]->(Identity)-[e3]->(Data)
    where set_has_element(IP.Categories, "ip_address")
      and set_has_element(VM.Categories, "virtual_machine")
        and isnotnull(VM.NodeProperties.rawData.vulnerableToRCE)
      and set_has_element(Identity.Categories, "identity")
      and Data.NodeProperties has "containsSensitiveData"
    project VMName = VM.NodeName, IdentityName = Identity.NodeName, DataName = Data.NodeName

Second, they encoded the policy gap, “no internet-exposed workload may hold a direct data-plane role to a sensitive store,” as a custom recommendation on the RawEntityMetadata pattern and assigned it to a custom standard inherited by the whole management group. Third, they attached a governance rule at the management-group scope with ownerSource.type: ByTag on the existing Owner tag and a 7-day remediationTimeframe, so every finding auto-routed to the owning team with a hard due date and weekly manager escalation, no central chasing required. Critical exposure paths additionally opened ServiceNow tickets via ruleType: ServiceNow.

The choke point turned out to be the AKS identity assignment: removing the over-broad role collapsed not one path but eleven, across different clusters that shared the pattern. Within two months, critical attack paths reaching sensitive data dropped from 11 to 1 (a legacy system with a documented exception and compensating controls), even though Secure Score moved only four points. The point that landed with leadership: the four-point move would have been invisible, but going from 11 reachable PII paths to 1 was the number that actually described risk.

Verify

The Defender CSPM plan is Standard: az security pricing show --name CloudPosture --query pricingTier -o tsv returns Standard.
Attack paths render under Attack path analysis in the portal, and ExposureGraphNodes | summarize by NodeLabel returns rows in advanced hunting.
A custom recommendation runs clean in the query editor and returns all seven required columns with HealthStatus populated for every in-scope resource.
The governance rule exists: az rest --method get --url ".../governanceRules/critical-internet-exposure?api-version=2022-01-01-preview" returns the rule with your remediationTimeframe and ownerSource.
Owners receive the weekly governance email, and overdue findings appear in the governance status report (or open ServiceNow tickets when ruleType is ServiceNow).
Continuous export lands recommendation records in the target workspace or Event Hub.

Defender for Cloud Attack Path Analysis: Custom Recommendations and Governance Rules

1. How the cloud security graph models exposure

2. Reading an attack path: internet to identity to data

3. Writing custom recommendations with KQL

4. Prioritizing with attack-path and exposure-based scoring

5. Governance rules: owners and remediation SLAs at scale

6. Multicloud: AWS and GCP, normalized

7. Wiring findings into ticketing and exposure-reduction workflows

8. Tracking remediation velocity and exposure reduction

Enterprise scenario

Verify

Checklist

Written by Vinod

Comments

Keep Reading

Stopping Token Theft: Conditional Access Token Protection and Authentication Context

Defender EASM: Discovering and Reducing Your Internet-Facing Attack Surface

Defender XDR Advanced Hunting: Custom Detection Rules and Automatic Attack Disruption