Security Azure

Defender for Cloud Attack Path Analysis: Custom Recommendations and Governance Rules

Secure Score tells you that 412 resources are non-compliant. It does not tell you that exactly three of them form a chain from a public load balancer, to a VM with a stale Owner role assignment, to a storage account flagged as holding sensitive data. That chain is the thing an attacker actually walks. This article operationalizes the cloud security graph in Defender for Cloud: reading attack paths, encoding your own risk logic as custom recommendations, then forcing remediation through governance rules with named owners and real SLAs. The goal is a posture program where the number that goes down is reachable, exploitable exposure, not a generic control count.

Everything that follows requires the Defender CSPM plan, not foundational CSPM. The graph, attack paths, the cloud security explorer, and KQL-based custom recommendations are all gated behind it.

# Defender CSPM is the plan named "CloudPosture" in the pricing API
az security pricing create --name CloudPosture --tier Standard

# Agentless scanning feeds the graph with vuln/secret/sensitive-data context
az security pricing list --query "value[?name=='CloudPosture'].{plan:name,tier:pricingTier}" -o table

1. How the cloud security graph models exposure

The graph is a context engine. It ingests asset inventory, network reachability, identity and permission relationships, vulnerability findings (from agentless scanning), and data sensitivity (from sensitive-data discovery in Defender CSPM), then connects them as a typed graph of nodes and edges. A node is an entity: a VM, a managed identity, a storage account, an IP address. An edge is a relationship an attacker could traverse: “is exposed to the internet”, “can authenticate as”, “can read data from”, “has permission to”.

This is the leap past Secure Score. A misconfiguration in isolation is a finding. The same misconfiguration on a node one hop from an internet-facing entity and two hops from a sensitive data store is a path. Defender for Cloud computes those paths and surfaces them as attack paths, scored by how much they actually matter.

The mental model: Secure Score answers “what is wrong?” The graph answers “what is reachable, and what does it reach?” You operationalize the second question, because that is where breach risk lives.

Two surfaces sit on top of the same graph:

2. Reading an attack path: internet to identity to data

Start in the portal under Attack path analysis. Each path reads left to right from an entry point (typically internet exposure) through intermediate hops to a target (sensitive data, a privileged identity, a critical workload). Defender for Cloud groups paths by pattern, for example “Internet exposed VM has high severity vulnerabilities and read permission to a data store.”

To reproduce and extend that logic yourself, drop into advanced hunting. The exposure graph schema is the foundation. ExposureGraphNodes carries NodeId, NodeLabel, Categories, EntityIds, and a dynamic NodeProperties; ExposureGraphEdges carries SourceNodeId, TargetNodeId, EdgeLabel, and the source/target node labels. Before writing a path query, learn the vocabulary of your own tenant:

// What entity types exist?
ExposureGraphNodes
| summarize Count = count() by NodeLabel
| order by Count desc

// What relationships connect them?
ExposureGraphEdges
| summarize Count = count() by EdgeLabel
| order by Count desc

The single most useful starting query is the canonical internet-to-RCE check. Note that exposure flags live under NodeProperties.rawData and category membership is tested with set_has_element against the Categories array:

// Internet-exposed VMs that are also vulnerable to remote code execution.
// Works across Azure, AWS, and GCP once those connectors feed the graph.
ExposureGraphNodes
| where isnotnull(NodeProperties.rawData.exposedToInternet)
| where isnotnull(NodeProperties.rawData.vulnerableToRCE)
| where set_has_element(Categories, "virtual_machine")

The real value is multi-hop traversal. Load the edges into a graph with make-graph, then match a pattern with graph-match. This query finds any path of up to three hops from an IP address to a virtual machine, which is the skeleton of an exposure-to-asset path:

let IPsAndVMs = ExposureGraphNodes
| where set_has_element(Categories, "ip_address") or set_has_element(Categories, "virtual_machine");
ExposureGraphEdges
| make-graph SourceNodeId --> TargetNodeId with IPsAndVMs on NodeId
| graph-match (IP)-[anyEdge*1..3]->(VM)
    where set_has_element(IP.Categories, "ip_address")
      and set_has_element(VM.Categories, "virtual_machine")
    project IpIds = IP.EntityIds, VmIds = VM.EntityIds, VmProps = VM.NodeProperties.rawData

To express the “lands on an identity that reaches data” half, pivot on identity edges. The EdgeLabel values are explicit and matchable, for example Can Authenticate As and CanRemoteInteractiveLogonTo. Filtering edges before make-graph keeps the graph small and the query fast:

let Nodes = ExposureGraphNodes
| where set_has_element(Categories, "identity")
   or (set_has_element(Categories, "device") and isnotnull(NodeProperties.rawData.criticalityLevel));
ExposureGraphEdges
| where EdgeLabel == "Can Authenticate As"
| make-graph SourceNodeId --> TargetNodeId with Nodes on NodeId
| graph-match (Device)-[canAuthAs]->(Identity)
    where set_has_element(Identity.Categories, "identity")
      and set_has_element(Device.Categories, "device")
    project IdentityIds = Identity.EntityIds, DeviceIds = Device.EntityIds

The discipline that matters at principal level: pre-filter nodes and edges to the minimum relevant set before make-graph. The NodeProperties column is large; graphing the whole tenant unfiltered hits memory limits and timeouts. Narrow Categories and EdgeLabel up front, project only the columns you need, then traverse.

3. Writing custom recommendations with KQL

Built-in recommendations cover the Microsoft Cloud Security Benchmark. Your organization always has rules the benchmark does not, for example “every internet-facing data store must have a private endpoint” or “no VM may be missing the Owner tag.” Defender CSPM lets you encode these as custom recommendations in KQL, evaluated across Azure, AWS, and GCP.

The data source is the RawEntityMetadata table, not Azure Resource Graph. Your query filters to a resource type, evaluates a condition, and stamps each in-scope resource with a HealthStatus. The contract is strict: return exactly seven columns and every resource in scope, marking compliant ones HEALTHY and non-compliant ones UNHEALTHY. Omitting a resource reads as “no data,” not as healthy.

// Custom recommendation: Key Vaults must have purge protection enabled.
RawEntityMetadata
| where Environment == 'Azure' and Identifiers.Type =~ 'Microsoft.KeyVault/vaults'
| extend condition = (Record.properties.enablePurgeProtection != true
                      or isnull(Record.properties.enablePurgeProtection))
| extend HealthStatus = iff(condition, 'UNHEALTHY', 'HEALTHY')
| project Id, Name, Environment, Identifiers, AdditionalData, Record, HealthStatus

The pattern is mechanical and worth internalizing, because everything else is a variation on it:

Column Type Role
Id string Resource identifier Defender uses to map the finding
Name string Display name in the recommendation UI
Environment string Azure, AWS, or GCP
Identifiers dynamic Resource type + identifiers, passed through from the record
AdditionalData dynamic Supplementary metadata, passed through
Record dynamic Full resource record; properties live under Record.properties.*
HealthStatus string UNHEALTHY or HEALTHY only (case-sensitive)

Two rules that will save you a debugging session: properties are reached via Record.properties.* (never bare properties.*), and resource-type comparisons use the case-insensitive =~ operator. To port a check to AWS or GCP, change Environment and swap Identifiers.Type for the corresponding type string.

Author and test in the portal first. Go to Environment settings > the subscription > Security policies > Create custom recommendation, set name/scope/severity/security issue, then Open query editor, paste, and Run query to validate the schema before saving. You need Security Admin to create the recommendation and Owner on the subscription to create the custom standard it gets assigned to. Once validated, you can deploy the same recommendation at scale through the Defender for Cloud API rather than clicking through every subscription.

4. Prioritizing with attack-path and exposure-based scoring

A flat list of 400 unhealthy resources is noise. The graph lets you rank by exposure so remediation effort lands where it removes the most risk. Two levers:

Resource criticality. Tag your crown-jewel assets so the graph weights paths that reach them. Critical assets show up as elevated criticalityLevel in NodeProperties, and attack paths terminating on them are scored higher. Set this deliberately for production data stores, domain controllers, and identity providers; do not let everything be critical, or nothing is.

Choke-point analysis. A single node that appears in many paths is a choke point, the highest-leverage fix in the estate. Remediating one over-permissioned managed identity can collapse dozens of paths at once. Approximate choke-point ranking in KQL by counting how often a node is an intermediate hop:

// Rank intermediate nodes by how many internet-to-VM paths traverse them.
let Scope = ExposureGraphNodes
| where set_has_element(Categories, "ip_address")
   or set_has_element(Categories, "identity")
   or set_has_element(Categories, "virtual_machine");
ExposureGraphEdges
| make-graph SourceNodeId --> TargetNodeId with Scope on NodeId
| graph-match (IP)-[e1]->(Mid)-[e2]->(VM)
    where set_has_element(IP.Categories, "ip_address")
      and set_has_element(Mid.Categories, "identity")
      and set_has_element(VM.Categories, "virtual_machine")
    project MidName = Mid.NodeName, MidId = Mid.NodeId
| summarize PathsThrough = count() by MidName, MidId
| order by PathsThrough desc

The prioritization rule I give teams: fix choke points first (one change, many paths removed), then paths that reach critical assets, then everything else on its normal SLA. Secure Score is a lagging scoreboard; attack-path reduction is the leading metric.

5. Governance rules: owners and remediation SLAs at scale

Finding risk is half the job. Governance rules assign each recommendation an owner and a remediation timeframe (the SLA), with weekly email nudges to owners and their managers, plus an optional grace period so resources do not tank Secure Score until they are actually overdue. This turns a dashboard into an accountable program.

Create a rule with the Defender for Cloud REST API (resource type Microsoft.Security/governanceRules). It targets recommendations by their assessment keys, sets the owner, and sets the SLA. The remediationTimeframe uses a d.hh:mm:ss duration string; valid windows are 7, 14, 30, or 90 days:

az rest --method put \
  --url "https://management.azure.com/subscriptions/<SUB_ID>/providers/Microsoft.Security/governanceRules/critical-internet-exposure?api-version=2022-01-01-preview" \
  --body '{
    "properties": {
      "displayName": "Critical internet-exposure recommendations",
      "description": "Owner-assigned SLA for high-severity exposure findings",
      "rulePriority": 200,
      "isDisabled": false,
      "ruleType": "Integrated",
      "sourceResourceType": "Assessments",
      "isGracePeriod": true,
      "remediationTimeframe": "7.00:00:00",
      "ownerSource": { "type": "Manually", "value": "cloudsec-team@contoso.com" },
      "governanceEmailNotification": {
        "disableManagerEmailNotification": false,
        "disableOwnerEmailNotification": false
      },
      "conditionSets": [
        {
          "conditions": [
            {
              "operator": "In",
              "property": "$.AssessmentKey",
              "value": "[\"b1cd27e0-4ecc-4246-939f-49c426d9d72f\", \"fe83f80b-073d-4ccf-93d9-6797eb870201\"]"
            }
          ]
        }
      ]
    }
  }'

Two design choices make this scale instead of becoming a maintenance burden:

// ownerSource by tag, so ownership tracks the resource, not the rule
"ownerSource": { "type": "ByTag", "value": "Owner" }

Tier your SLAs to severity: a 7-day window on the critical-exposure rule, 30 days on medium-severity hygiene, 90 days on low. Set isGracePeriod: true on the longer windows so hygiene work does not distort Secure Score before it is genuinely late.

6. Multicloud: AWS and GCP, normalized

The same machinery spans clouds. Connect AWS accounts and GCP projects as security connectors (Microsoft.Security/securityConnectors); once Defender CSPM is enabled on the connector, AWS and GCP assets flow into the same graph and attack paths cross cloud boundaries. Custom recommendations target them by switching Environment to 'AWS' or 'GCP' and matching the native resource type:

// Same control, GCP storage buckets
RawEntityMetadata
| where Environment == 'GCP' and Identifiers.Type =~ 'storage.googleapis.com/Bucket'
| extend condition = (Record.iamConfiguration.uniformBucketLevelAccess.enabled != true)
| extend HealthStatus = iff(condition, 'UNHEALTHY', 'HEALTHY')
| project Id, Name, Environment, Identifiers, AdditionalData, Record, HealthStatus

Governance rules attach to connector scopes too, so an AWS account or GCP project gets the same owner-and-SLA treatment as an Azure subscription:

az rest --method put \
  --url "https://management.azure.com/subscriptions/<SUB_ID>/resourceGroups/<RG>/providers/Microsoft.Security/securityConnectors/<GCP_CONNECTOR>/providers/Microsoft.Security/governanceRules/gcp-critical?api-version=2022-01-01-preview" \
  --body '{
    "properties": {
      "displayName": "GCP critical recommendations",
      "rulePriority": 210,
      "ruleType": "Integrated",
      "sourceResourceType": "Assessments",
      "isGracePeriod": true,
      "remediationTimeframe": "14.00:00:00",
      "ownerSource": { "type": "ByTag", "value": "owner" },
      "governanceEmailNotification": {
        "disableManagerEmailNotification": false,
        "disableOwnerEmailNotification": false
      },
      "conditionSets": [
        { "conditions": [ { "operator": "In", "property": "$.AssessmentKey",
          "value": "[\"b1cd27e0-4ecc-4246-939f-49c426d9d72f\"]" } ] }
      ]
    }
  }'

7. Wiring findings into ticketing and exposure-reduction workflows

Email nudges work for engaged owners; they do not scale to a backlog. Push attack-path findings into the system of record where work actually gets tracked.

az security automation create \
  --name export-recommendations \
  --resource-group security-rg \
  --location eastus \
  --scopes '[{"description":"sub","scopePath":"/subscriptions/<SUB_ID>"}]' \
  --sources '[{"eventSource":"Assessments","ruleSets":[]}]' \
  --actions '[{"actionType":"Workspace","workspaceResourceId":"/subscriptions/<SUB_ID>/resourceGroups/security-rg/providers/Microsoft.OperationalInsights/workspaces/<WS>"}]'

The architecture I recommend: attack paths and high-severity recommendations route to tickets automatically with the governance SLA as the ticket due date, while everything else stays in the Defender backlog under its grace period. You want engineers working tickets in their normal queue, not logging into a security portal they will forget exists.

8. Tracking remediation velocity and exposure reduction

Measure two things: are findings closing inside SLA (velocity), and is total reachable exposure shrinking (outcome). The governance status API reports per-rule progress, including overdue counts:

# Per-subscription governance progress (open vs. overdue per owner)
az rest --method post \
  --url "https://management.azure.com/subscriptions/<SUB_ID>/providers/Microsoft.Security/governanceRules/default/ruleIdExecuteSingleSubscription?api-version=2022-01-01-preview"

# Trend Secure Score over time as the lagging outcome metric
az security secure-scores list \
  --query "value[].{name:displayName, current:score.current, max:score.max, pct:score.percentage}" -o table

For the leading metric, track attack-path count by pattern week over week from the exposure graph, segmented by severity and by whether the path reaches a critical asset. A program that is working shows a falling path count and a rising on-time remediation rate at the same time. If Secure Score climbs but path count is flat, you are remediating cheap findings and ignoring the reachable ones, the exact failure mode the graph exists to prevent.

Enterprise scenario

A retail platform team ran Defender for Cloud across ~140 Azure subscriptions plus two AWS accounts behind a single management group. Secure Score sat around 68% and had been “improving” for a quarter, yet a red-team engagement walked a path nobody had triaged: a public Application Gateway, to an AKS-hosted VM with a high-severity CVE, to a kubelet-assigned managed identity that held Storage Blob Data Reader on a bucket the sensitive-data scanner had flagged as holding PII. Three findings, each individually low-priority in the flat list, formed one critical exposure path.

The constraint: 140 subscriptions, dozens of owning teams, and no central authority to manually chase remediation. Telling teams to “go fix things” had already failed for a quarter.

The fix had three moves. First, they made the path queryable and continuous, lifting the red-team’s chain into an advanced-hunting graph query so it would resurface the instant a similar shape reappeared.

let Scope = ExposureGraphNodes
| where set_has_element(Categories, "ip_address")
   or set_has_element(Categories, "identity")
   or set_has_element(Categories, "virtual_machine")
   or NodeProperties has "containsSensitiveData";
ExposureGraphEdges
| make-graph SourceNodeId --> TargetNodeId with Scope on NodeId
| graph-match (IP)-[e1*1..2]->(VM)-[e2]->(Identity)-[e3]->(Data)
    where set_has_element(IP.Categories, "ip_address")
      and set_has_element(VM.Categories, "virtual_machine")
        and isnotnull(VM.NodeProperties.rawData.vulnerableToRCE)
      and set_has_element(Identity.Categories, "identity")
      and Data.NodeProperties has "containsSensitiveData"
    project VMName = VM.NodeName, IdentityName = Identity.NodeName, DataName = Data.NodeName

Second, they encoded the policy gap, “no internet-exposed workload may hold a direct data-plane role to a sensitive store,” as a custom recommendation on the RawEntityMetadata pattern and assigned it to a custom standard inherited by the whole management group. Third, they attached a governance rule at the management-group scope with ownerSource.type: ByTag on the existing Owner tag and a 7-day remediationTimeframe, so every finding auto-routed to the owning team with a hard due date and weekly manager escalation, no central chasing required. Critical exposure paths additionally opened ServiceNow tickets via ruleType: ServiceNow.

The choke point turned out to be the AKS identity assignment: removing the over-broad role collapsed not one path but eleven, across different clusters that shared the pattern. Within two months, critical attack paths reaching sensitive data dropped from 11 to 1 (a legacy system with a documented exception and compensating controls), even though Secure Score moved only four points. The point that landed with leadership: the four-point move would have been invisible, but going from 11 reachable PII paths to 1 was the number that actually described risk.

Verify

Checklist

Defender-for-Cloudattack-pathscloud-security-graphcustom-recommendationsgovernanceCSPM

Comments

Keep Reading