Security Azure

Cloud Workload Protection in Practice: Defender for Servers, Containers, and Databases

Posture management tells you a VM is missing a patch. Workload protection tells you that the same VM is currently running a reverse shell that beaconed out four minutes ago. Those are different jobs, different plans, and different on-call rotations. This guide is about the second one: the runtime workload-protection plans in Microsoft Defender for Cloud, and how to deploy and tune them so the alerts that reach your SOC are real, attributable, and actionable.

I assume you already have foundational CSPM on and Secure Score wired into a review cadence. If not, treat that as a prerequisite. Here we stay in the runtime lane: server EDR, container threat detection, and database protection.

1. CWPP vs CSPM: protecting running workloads, not posture

Cloud Security Posture Management (CSPM) is a configuration discipline. It scans declared state and tells you what is misconfigured, reachable, or non-compliant. It is largely agentless and its findings are latent risk.

Cloud Workload Protection Platform (CWPP) is a runtime discipline. It watches processes, syscalls, network connections, control-plane audit logs, and query patterns on live workloads and tells you what is happening right now. Its findings are alerts with a kill chain, not recommendations with a remediation button.

In Defender for Cloud the split is concrete:

Concern Plane Plan family Output
Posture / latent risk Control + config Defender CSPM (CloudPosture) Recommendations, attack paths, Secure Score
Runtime / active threat Data + process Defender for Servers/Containers/Databases Security alerts with MITRE mapping

The practical consequence: CSPM findings go to a platform/owner backlog with an SLA measured in days; CWPP alerts go to a SOC queue with a response time measured in minutes. Do not route them to the same place. If you have not separated those two streams, that is the first thing to fix.

2. Enable Defender for Servers with integrated EDR and agentless scanning

Defender for Servers ships two sub-plans, and the choice drives both cost and capability.

Capability Plan 1 Plan 2
Integrated Defender for Endpoint (EDR) Yes Yes
Agentless vulnerability scanning Yes Yes
Agentless malware scanning Yes Yes
File Integrity Monitoring No Yes
Just-in-time VM access No Yes
Free data ingestion to Log Analytics (500 MB/day) No Yes
Pricing model Per server/hour Per server/hour (higher)

Plan 1 is essentially “EDR plus agentless scanning.” Plan 2 adds the controls most enterprises actually want — FIM, JIT, and the data allowance. Enable the sub-plan explicitly; the default if you omit --subplan is P2.

# Register the provider once per subscription (idempotent)
az provider register --namespace Microsoft.Security

# Enable Defender for Servers Plan 2 at subscription scope
az security pricing create \
  --name VirtualMachines \
  --tier Standard \
  --subplan P2

# Confirm what is actually enabled
az security pricing list \
  --query "value[?name=='VirtualMachines'].{plan:name,tier:pricingTier,sub:properties.subPlan}" \
  -o table

The single most common deployment defect here is leaving the MDE auto-provisioning integration off, which means the EDR sensor never lands and you are paying for Plan 2 while getting agentless-only coverage. Verify the integration setting:

# WDATP = the MDE sensor integration; should be On
az security setting show --name WDATP --query "enabled"

# Enable it if it is false
az security setting update --name WDATP --enabled true

For Azure VMs and Arc-enabled servers, the unified MDE sensor is provisioned automatically once this is on; no MMA/AMA agent is required for EDR. Agentless scanning, by contrast, requires no on-VM component at all — it snapshots the disk out-of-band, which is why it cannot detect runtime behavior. The two are complementary: agentless gives you breadth and zero footprint; the EDR sensor gives you real-time process and network detection. You want both.

Migration note: if you are coming from the legacy Log Analytics agent (MMA), it reached end of support and FIM/recommendation collection now runs on the Azure Monitor Agent (AMA) or, for the current generation, on Defender’s own data collection. Do not build new FIM on MMA.

3. Vulnerability assessment, file integrity monitoring, and adaptive controls

Vulnerability assessment. Defender for Servers uses the Microsoft Defender Vulnerability Management (MDVM) engine, integrated with the EDR sensor. There is nothing to deploy per VM beyond the sensor; findings surface as the recommendation Machines should have vulnerability findings resolved and feed the cloud security graph so a critical CVE on an internet-facing box becomes an attack path, not a row in a list. Pull current findings programmatically for ticketing:

az security assessment list \
  --query "[?contains(displayName, 'vulnerability findings')].{name:displayName,status:status.code}" \
  -o table

File Integrity Monitoring (FIM). FIM is a Plan 2 feature that watches a defined set of files, directories, and registry keys for change and raises an alert on unexpected modification — the classic detection for a tampered /etc/sudoers, a planted web shell in wwwroot, or a modified Run key. The current implementation collects via the Defender for Endpoint sensor and writes events to your Log Analytics workspace. Configure it from Defender for Cloud -> Environment settings -> [subscription] -> Defender for Servers settings -> File Integrity Monitoring, selecting the workspace and the rule set. Start from the recommended Linux/Windows baselines, then add the application-specific paths that matter to you:

Linux:   /bin, /sbin, /usr/bin, /usr/sbin, /etc/passwd, /etc/sudoers,
         /etc/ssh/sshd_config, /var/www/html
Windows: HKLM\...\Run, HKLM\...\RunOnce, C:\Windows\System32\drivers,
         <app>\wwwroot

Scope FIM narrowly. Monitoring /var/log or a directory that legitimately churns produces a wall of noise that trains the SOC to ignore the alert class entirely.

Adaptive controls. Two runtime hardening features earn their keep:

Request JIT access from the CLI so it can live in a runbook instead of a portal click:

az security jit-policy initiate \
  --resource-group prod-rg \
  --location eastus \
  --name default \
  --virtual-machines "[{\"id\":\"<vm-resource-id>\",\"ports\":[{\"number\":22,\"allowedSourceAddressPrefix\":\"203.0.113.10\",\"duration\":\"PT1H\"}]}]"

4. Deploy Defender for Containers: runtime threat detection and Kubernetes hardening

Defender for Containers is a single plan that covers AKS, Arc-enabled Kubernetes, and EKS/GKE through the multicloud connectors. It has three pillars: runtime threat detection on the cluster, vulnerability assessment of images, and Kubernetes posture hardening.

# Enable the Containers plan
az security pricing create --name Containers --tier Standard

Runtime detection needs the Defender sensor (a DaemonSet built on eBPF) and the cluster needs Azure Policy for Kubernetes (a Gatekeeper/OPA admission webhook) for hardening recommendations and admission control. On AKS these can be auto-provisioned; verify they are actually running rather than trusting the toggle:

# Defender sensor DaemonSet
kubectl get ds -n kube-system microsoft-defender-collector-ds -o wide

# Azure Policy / Gatekeeper add-on
kubectl get pods -n gatekeeper-system
az aks show -g prod-rg -n prod-aks \
  --query "addonProfiles.azurepolicy.enabled"

Enable the AKS add-ons explicitly if you manage clusters as code:

az aks enable-addons \
  --addons azure-policy \
  --resource-group prod-rg \
  --name prod-aks

Runtime detections you should expect to see fire in a real environment: a shell spawned inside a container, a crypto-mining process, a connection to a known C2 address, mounting of the host filesystem, and exposure of the Kubernetes dashboard. These arrive as alerts with the affected pod, image, and node attached.

5. Admission control and image scanning, in the registry and at deploy time

There are two scan points, and you want both because they catch different things.

Registry scan. Every image pushed to Azure Container Registry is scanned by the MDVM engine, and images already running in a cluster are re-scanned as new CVEs are published — so an image that was clean at push time still raises a finding when a new vulnerability drops. This is the difference between point-in-time and continuous assessment.

Deploy-time admission control. Azure Policy for Kubernetes enforces guardrails through the Gatekeeper webhook before a pod is admitted. Built-in Defender/Azure Policy initiatives cover the high-value controls: block privileged containers, disallow host namespace sharing, require read-only root filesystems, restrict allowed image registries. Run them in audit first, find what breaks, then move the chosen constraints to deny.

A registry allow-list is the highest-leverage admission rule — it stops anyone deploying an unscanned image from Docker Hub straight into prod. As a ConstraintTemplate-backed Gatekeeper constraint it looks like this:

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRepos
metadata:
  name: prod-allowed-registries
spec:
  enforcementAction: deny   # start with: dryrun
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    namespaces: ["prod", "payments"]
  parameters:
    repos:
      - "prodacr.azurecr.io/"

Sequence matters: ship every constraint as dryrun, watch the Gatekeeper audit results and Defender recommendations for a full deployment cycle, then promote to deny. Flipping straight to deny on a busy cluster is how you cause the outage the security control was supposed to prevent.

6. Defender for Databases: SQL, Cosmos DB, and open-source engines

Database protection is split across distinct plans by engine. Enable the ones that map to real data stores:

# Azure SQL Database / Managed Instance / Synapse
az security pricing create --name SqlServers --tier Standard

# SQL Server running on VMs (IaaS)
az security pricing create --name SqlServerVirtualMachines --tier Standard

# Azure Database for PostgreSQL / MySQL / MariaDB
az security pricing create --name OpenSourceRelationalDatabases --tier Standard

# Azure Cosmos DB
az security pricing create --name CosmosDbs --tier Standard

These plans provide Advanced Threat Protection for the data plane. Representative alerts:

Engine Representative runtime alerts
Azure SQL / SQL on VM SQL injection (active and vulnerability-probing), brute-force login, access from unusual location/principal, anomalous data extraction
PostgreSQL / MySQL / MariaDB Brute force, access from a suspicious IP / unfamiliar principal, login from a new region
Cosmos DB Access from a Tor exit node or suspicious IP, unusual extraction of large data volumes, key-based access anomalies

For SQL on a VM, the SQL ATP signal rides the same MDE sensor as Defender for Servers, so a server already onboarded for EDR needs no second agent. For PaaS SQL and Cosmos DB the detection is service-side and requires nothing deployed — only the plan enabled. Confirm the SQL alert pipeline is live at the resource level:

az security atp storage show \
  --resource-group prod-rg \
  --storage-account <name> 2>/dev/null
# For SQL, ATP is surfaced under the SQL resource's security alert policy;
# validate end-to-end with the alert-sample test in the Verify section.

7. Tune runtime alerts, suppression rules, and SOC handoff

Out of the box these plans are tuned for broad detection, which means noise. Tuning is not optional; an untuned CWPP deployment trains responders to close alerts without reading them.

Suppression rules. When a benign pattern fires repeatedly — a vulnerability scanner that looks like brute force, an automation principal that trips “unusual access” — create a scoped suppression rule rather than disabling the detection globally. Suppress on the narrowest dimensions that kill the noise (specific alert type, specific resource or IP), never the whole alert name across the tenant.

az security automation-scopes ...   # for routing; suppression rules live under
# Security -> Alerts -> Suppression rules (alertsSuppressionRules ARM type).
# Scope a rule to one alert type and one resource, with an expiry:
{
  "properties": {
    "reason": "Authorized internal vulnerability scanner",
    "alertType": "SQL.DB_BruteForce",
    "state": "Enabled",
    "expirationDateUtc": "2026-12-31T00:00:00Z",
    "comment": "Scanner 10.20.0.5, ticket SEC-4821",
    "suppressionAlertsScope": {
      "allOf": [
        { "field": "entities.ip.address", "in": ["10.20.0.5"] }
      ]
    }
  }
}

Always set expirationDateUtc. A suppression rule with no expiry is a permanent blind spot that outlives the person who created it.

SOC handoff. CWPP alerts must leave Defender for Cloud and land where responders work. Two patterns, used together:

# Stream alerts to a Log Analytics workspace via continuous export
az security automation create \
  --resource-group security-rg \
  --name export-high-alerts \
  --location eastus \
  --scopes "/subscriptions/<sub-id>" \
  --sources "[{\"eventSource\":\"Alerts\",\"ruleSets\":[{\"rules\":[{\"propertyJPath\":\"Severity\",\"propertyType\":\"String\",\"expectedValue\":\"High\",\"operator\":\"Equals\"}]}]}]" \
  --actions "[{\"actionType\":\"Workspace\",\"workspaceResourceId\":\"<workspace-id>\"}]"

Filter at export time on severity. Forwarding every Low/Informational alert into the SOC queue is the fastest way to bury the one that matters.

Verify

Validation has two halves: confirm the plans and sensors are present, and confirm they actually fire.

# 1. All runtime plans are Standard
az security pricing list \
  --query "value[?pricingTier=='Standard'].{plan:name,sub:properties.subPlan}" -o table

# 2. EDR integration on, sensor present on a sample VM
az security setting show --name WDATP --query "enabled"

# 3. Container sensor + admission webhook running
kubectl get ds -n kube-system microsoft-defender-collector-ds
kubectl get pods -n gatekeeper-system

# 4. Recent alerts exist (proves the pipeline end-to-end)
az security alert list \
  --query "[].{name:alertDisplayName,sev:severity,resource:compromisedEntity}" -o table

Then prove detection with safe, intended-for-testing signals rather than waiting for a real attack:

If a test signal does not produce an alert, the plan is enabled but the data path is broken — almost always a missing sensor or a workspace misconfiguration, not a licensing gap.

Enterprise scenario

A payments platform team ran a 40-cluster AKS estate plus a fleet of SQL-on-VM hosts. They had enabled Defender for Containers and Defender for Servers months earlier, the toggles were green, and leadership considered the workloads “protected.” During an incident review the SOC noted they had never received a single container runtime alert from one business unit’s clusters — despite a red-team exercise that had successfully dropped a crypto-miner pod and run for two days.

The constraint: those clusters were provisioned by an older Terraform module that disabled the AKS Azure Policy and monitoring add-ons for performance reasons, and the Defender sensor DaemonSet had never been scheduled because a restrictive PodSecurity admission setting in kube-system blocked the privileged collector pods. The plan was billed and enabled at the subscription, so every dashboard showed coverage — but the data plane was dark. Posture said “protected,” runtime said nothing, and the gap was invisible until someone asked why a known-compromised cluster had been silent.

The fix was a coverage-validation gate in CI, not a portal change. They added a post-deploy check that fails the pipeline if the sensor DaemonSet is not Running on every node, turning “the plan is enabled” into “the sensor is actually collecting”:

# Fails the deploy if any node lacks a Running Defender collector pod
desired=$(kubectl get ds -n kube-system microsoft-defender-collector-ds \
  -o jsonpath='{.status.desiredNumberScheduled}')
ready=$(kubectl get ds -n kube-system microsoft-defender-collector-ds \
  -o jsonpath='{.status.numberReady}')
if [ -z "$ready" ] || [ "$desired" != "$ready" ]; then
  echo "Defender sensor not fully scheduled: ${ready:-0}/${desired:-0}" >&2
  exit 1
fi

The lesson generalizes to every plan in this article: an enabled plan is a billing fact, not a coverage fact. Coverage is the sensor running, the integration toggled on, and a test signal producing an alert. Validate the second thing, in CI, on every workload.

Rollout checklist

CWPPDefender-for-ServersDefender-for-Containersruntime-protectionvulnerability-managementfile-integrity

Comments

Keep Reading