Operationalizing Entra ID Protection: Risk-Based Conditional Access, Detection Tuning, and Risk Investigation

Turning on the two default risk policies in Entra ID Protection takes about ninety seconds. Running the program so it actually reduces account-takeover risk without burying the helpdesk in password-reset tickets is the part nobody documents. This is the build I use: the risk model you have to internalize first, risk-based Conditional Access with safe remediation, the detection tuning that kills false positives, and a repeatable investigation loop wired into Microsoft Graph and Sentinel.

Identity Protection requires Entra ID P2 for every user in scope of a risk-based policy. P1 gets you Conditional Access but not risk-based conditions, the risky-users report, or the detection feed. Assume P2 throughout. You will also need Conditional Access Administrator (or Security Administrator) and, for the investigation work, Security Operator or Security Reader.

1. The risk model: real-time vs offline detections and risk levels

Two scores drive everything, and they are not the same thing.

Sign-in risk is the probability that a specific authentication request did not come from the legitimate user. It is computed per sign-in. Examples: anonymous IP address, impossible travel, a token with anomalous properties, a password spray pattern.
User risk is the probability that the identity itself is compromised, accumulated across sign-ins and other signals. The flagship example is leaked credentials — Microsoft matched the user’s plaintext credential pair against a breach corpus.

The second axis that matters operationally is when the detection fires:

Type	Latency	Can it block the sign-in?	Examples
Real-time	Evaluated during authentication	Yes	Anonymous IP, impossible travel (real-time), unfamiliar sign-in properties, malicious IP
Offline	Minutes to hours after sign-in	No (raises risk for the next sign-in / user-risk policy)	Leaked credentials, threat intelligence, suspicious inbox rules, password spray (some)

This distinction is the source of most confusion in incident reviews. A leaked-credentials detection is offline and user-level — it cannot block the sign-in that triggered it, because Microsoft only learned about the leak afterward. It can only raise the user’s risk so the user-risk policy forces remediation on the next authentication. If your design assumes leaked credentials blocks in real time, you have a gap.

Each detection maps to a risk level — Low, Medium, or High — and the aggregate sign-in and user scores roll up to the same three levels (plus “No risk”). You set policy thresholds against these levels. Microsoft does not publish the exact weighting, and you should not try to reverse-engineer it; tune against observed outcomes, not assumed math.

2. Designing the risk-based Conditional Access policies

Identity Protection historically shipped its own built-in user-risk and sign-in-risk policies. Build the equivalents as Conditional Access policies instead. CA gives you scoping, report-only mode, exclusions, and session controls the legacy toggles never had, and it is where Microsoft has consolidated risk enforcement.

Two policies, two jobs:

Policy A — Sign-in risk (real-time gate). When sign-in risk is Medium or High, require MFA. This challenges the current authentication.

Policy B — User risk (remediation gate). When user risk is High, require a secure password change. This forces the user to prove control of the account and resets accumulated risk.

Always exclude break-glass first

Before either policy, confirm your emergency-access accounts are excluded from every CA policy in the tenant. A misfiring risk policy that locks out your only path back in is the classic self-inflicted outage.

# Emergency-access accounts must be excluded from ALL CA policies.
Connect-MgGraph -Scopes "Policy.Read.All"
Get-MgIdentityConditionalAccessPolicy |
  Where-Object { $_.State -eq 'enabled' } |
  Select-Object DisplayName,
    @{n='ExcludedUsers';e={$_.Conditions.Users.ExcludeUsers}} |
  Format-Table -AutoSize

Policy A: sign-in-risk MFA

{
  "displayName": "CA300-Require MFA for medium+ sign-in risk",
  "state": "enabledForReportingButNotEnforced",
  "conditions": {
    "users": {
      "includeUsers": ["All"],
      "excludeGroups": ["<break-glass-group-id>"]
    },
    "applications": { "includeApplications": ["All"] },
    "signInRiskLevels": ["high", "medium"]
  },
  "grantControls": {
    "operator": "OR",
    "builtInControls": ["mfa"]
  },
  "sessionControls": {
    "signInFrequency": {
      "isEnabled": true,
      "frequency": 1,
      "type": "hours",
      "authenticationType": "primaryAndSecondaryAuthentication"
    }
  }
}

The signInFrequency of one hour is deliberate. On a risky sign-in you want the satisfied MFA claim to be short-lived, not cached for the default rolling window, so a subsequent risky sign-in is re-challenged rather than riding an old token.

Policy B: user-risk secure password change

{
  "displayName": "CA301-Require secure password change for high user risk",
  "state": "enabledForReportingButNotEnforced",
  "conditions": {
    "users": {
      "includeUsers": ["All"],
      "excludeGroups": ["<break-glass-group-id>"]
    },
    "applications": { "includeApplications": ["All"] },
    "userRiskLevels": ["high"]
  },
  "grantControls": {
    "operator": "AND",
    "builtInControls": ["mfa", "passwordChange"]
  }
}

Two non-negotiables here:

passwordChange requires mfa in the same grant control (operator AND). Entra will not let a user reset a password to clear high risk without first proving identity with strong auth — otherwise the attacker who knows the leaked password would just reset it themselves.
Scope user risk to High only. Medium user-risk thresholds generate a steady stream of forced password changes for benign anomalies. High user-risk is dominated by leaked credentials and confirmed-compromise signals, which genuinely warrant a reset.

Deploy both in report-only (enabledForReportingButNotEnforced) for at least two weeks. Watch the sign-in logs’ report-only tab to see exactly who would have been challenged. Only flip to enabled once the would-be-impact volume matches what your helpdesk can absorb.

3. Self-remediation: secure password change and MFA without flooding the helpdesk

The entire point of risk-based policy is that the user remediates themselves. For that to work:

Self-service password reset (SSPR) must be enabled and registered, with writeback if you are hybrid (otherwise a cloud reset never reaches on-prem AD and the user is locked out of domain-joined resources). Confirm with a combined-registration campaign before enforcing Policy B.
The user must have a strong MFA method registered. A user with no MFA method who trips Policy A cannot satisfy it and will call the helpdesk. Drive registration to near-100% first.

When a user trips the user-risk policy, the flow is: sign in -> blocked on the password-change control -> MFA challenge -> set a new password -> risk is automatically remediated and the user-risk state drops to none. No admin touch. This self-remediation is what keeps the helpdesk out of the loop, and it is why getting SSPR and MFA registration to saturation before enforcement is the highest-leverage prep work in the whole program.

A common miss: hybrid tenants with password writeback disabled. The user “successfully” changes their cloud password, risk clears, then they cannot log into their domain-joined laptop because on-prem AD never got the new hash. Verify writeback end to end before you enforce.

4. Reducing false positives: trusted locations, VPN exclusions, and dismissal hygiene

Out of the box, the noisiest detections in most tenants are unfamiliar sign-in properties and impossible travel — and the usual culprit is corporate egress. A user behind a split-tunnel VPN, or a fleet that NATs through a cloud egress in another region, looks like impossible travel to the risk engine.

Define trusted named locations

Tag your corporate egress IP ranges as trusted named locations. Sign-ins from trusted locations carry lower risk and can be excluded from risk policies.

Connect-MgGraph -Scopes "Policy.ReadWrite.ConditionalAccess"

$params = @{
  "@odata.type" = "#microsoft.graph.ipNamedLocation"
  displayName   = "Corp egress - trusted"
  isTrusted     = $true
  ipRanges      = @(
    @{ "@odata.type" = "#microsoft.graph.iPv4CidrRange"; cidrAddress = "203.0.113.0/24" },
    @{ "@odata.type" = "#microsoft.graph.iPv4CidrRange"; cidrAddress = "198.51.100.0/24" }
  )
}
New-MgIdentityConditionalAccessNamedLocation -BodyParameter $params

isTrusted = $true is the field that actually lowers risk scoring — a plain named location without it only helps for location-condition policies, not risk reduction.

Configure the VPN connectivity / known-network signal

If you run a VPN that egresses unpredictably, feed Entra the VPN’s IP metadata so impossible-travel logic treats those hops as known. At minimum, get every VPN and cloud-egress range into the trusted named-location set above. Reserve hard policy exclusions for genuinely device-bound, compliant paths (for example, sign-ins from compliant/Hybrid-joined devices), because a blanket location exclusion is exactly the hole an attacker who lands inside that range will use.

Dismissal hygiene

When you dismiss a risk detection, you are telling the system “this was a false positive.” That is a signal, not just a cleanup action, and sloppy dismissal poisons future scoring.

Dismiss only when you have confirmed the activity was legitimate. It clears the user’s risk to none.
Confirm safe on a specific risky sign-in is the targeted version: “this one sign-in was fine.”
Confirm compromised is the opposite — it raises the user to High risk immediately and, critically, feeds the model so similar patterns score higher tenant-wide.

Never bulk-dismiss to clear a dashboard. Each unjustified dismissal trains the engine to under-weight a real pattern.

5. Investigating risky users and risky sign-ins

When a user lights up, work a consistent loop. The three reports — Risky users, Risky sign-ins, and Risk detections — are your evidence chain.

Open the risky user. Look at the risk level, the recent risky sign-ins, and the detection types that drove the score. Leaked credentials + an anonymous-IP sign-in is a very different story from a single impossible-travel hit.
Pivot to the risky sign-ins. For each, inspect IP, location, device, application, and whether MFA was satisfied. A High-risk sign-in that passed MFA from a known device is usually a false positive worth confirm safe. A High-risk sign-in from an anonymous IP that the user denies is a confirm compromised.
Decide and set state:
- Legitimate -> Dismiss user risk (or confirm-safe the specific sign-in).
- Compromised -> Confirm compromised, then run containment: revoke sessions, force password reset, and review what the session touched.

Containment via Graph, scripted so it is the same every time:

Connect-MgGraph -Scopes "User.ReadWrite.All",
  "IdentityRiskyUser.ReadWrite.All", "Directory.AccessAsUser.All"

$userId = "<object-id-of-compromised-user>"

# 1. Mark the user compromised (raises to High, feeds the model).
Confirm-MgRiskyUserCompromised -UserIds @($userId)

# 2. Revoke all refresh/session tokens immediately.
Invoke-MgInvalidateUserRefreshToken -UserId $userId

# 3. Force credential change on next sign-in.
Update-MgUser -UserId $userId `
  -PasswordProfile @{ forceChangePasswordNextSignIn = $true }

After the user has remediated and you have validated the account is clean, reset the risk state so they are not perpetually flagged:

# Clear residual risk once remediation is verified.
Invoke-MgDismissRiskyUser -UserIds @($userId)

Confirm compromised followed by a verified Dismiss is the correct close-out: you fed the model the true label and returned the user to a clean state. Skipping the dismiss leaves the user High-risk forever and re-trips Policy B on every sign-in.

6. Exporting risk detections via Microsoft Graph for automation and SIEM correlation

The portal is for humans; automation reads Graph. The three resources you will poll or query are riskDetections, riskyUsers, and the per-user history.

# All risk detections in the last 24h, high + medium only.
SINCE=$(date -u -v-1d '+%Y-%m-%dT%H:%M:%SZ' 2>/dev/null || date -u -d '1 day ago' '+%Y-%m-%dT%H:%M:%SZ')

az rest --method get \
  --url "https://graph.microsoft.com/v1.0/identityProtection/riskDetections?\$filter=detectedDateTime ge ${SINCE} and (riskLevel eq 'high' or riskLevel eq 'medium')&\$orderby=detectedDateTime desc" \
  --query "value[].{user:userPrincipalName, type:riskEventType, level:riskLevel, ip:ipAddress, when:detectedDateTime}" \
  -o table

For a SIEM that does not natively connect to Identity Protection, a scheduled pull keyed on a watermark is the reliable pattern — store the last detectedDateTime you ingested and filter forward from it so you never double-count or drop events:

# Watermark-driven incremental pull (pseudo-loop body).
LAST="$(cat /var/lib/idp/last_watermark 2>/dev/null || echo '2026-06-01T00:00:00Z')"

az rest --method get \
  --url "https://graph.microsoft.com/v1.0/identityProtection/riskDetections?\$filter=detectedDateTime gt ${LAST}&\$orderby=detectedDateTime asc&\$top=200" \
  > /tmp/idp_batch.json

# After successful ingest, advance the watermark to the newest event.
jq -r '.value[-1].detectedDateTime // empty' /tmp/idp_batch.json \
  | tee /var/lib/idp/last_watermark

Required application permission for an unattended app: IdentityRiskEvent.Read.All (and IdentityRiskyUser.Read.All if you also pull riskyUsers). Grant admin consent and authenticate the daemon with a federated credential, not a client secret.

7. Streaming Identity Protection into Microsoft Sentinel and building playbooks

For anything beyond a daily pull, connect Identity Protection to Sentinel through the dedicated data connector. It lands two tables you will query constantly: SecurityAlert (Identity Protection alerts) and the underlying detections, alongside SigninLogs and AADRiskyUsers/AADRiskyServicePrincipals if you stream the diagnostic categories.

A starter hunting query — risky sign-ins that nonetheless succeeded, which is exactly the gap a real-time policy is supposed to close:

SigninLogs
| where TimeGenerated > ago(24h)
| where RiskLevelDuringSignIn in ("high", "medium")
| where ResultType == 0            // 0 == success
| project TimeGenerated, UserPrincipalName, AppDisplayName,
          IPAddress, Location, RiskLevelDuringSignIn, RiskState,
          ConditionalAccessStatus
| sort by TimeGenerated desc

If ConditionalAccessStatus shows notApplied on a high-risk success, your sign-in-risk policy did not catch that path — a tuning finding, not just an alert.

Correlate leaked credentials against subsequent successful sign-ins to surface accounts where the offline detection landed before you forced a reset:

let leaked =
    SecurityAlert
    | where TimeGenerated > ago(7d)
    | where ProductName == "Microsoft Entra ID Protection"
    | where AlertName has "leaked" 
    | extend upn = tostring(parse_json(Entities)[0].Name)
    | project leakTime = TimeGenerated, upn;
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| join kind=inner leaked on $left.UserPrincipalName == $right.upn
| where TimeGenerated > leakTime
| project TimeGenerated, UserPrincipalName, IPAddress, leakTime;

Wire an automation rule so that any Identity Protection alert at High severity triggers a Logic App playbook that calls the Graph containment steps from Section 5 (revoke sessions, force reset, post to the SecOps channel). Keep the decision — confirm compromised vs dismiss — human-in-the-loop unless the signal is unambiguous (leaked credentials with a same-IP foreign sign-in is a reasonable auto-contain candidate; impossible travel alone is not).

Verify

Prove the program works before you trust it.

# 1. Both risk CA policies exist and are in the expected state.
Connect-MgGraph -Scopes "Policy.Read.All"
Get-MgIdentityConditionalAccessPolicy |
  Where-Object { $_.DisplayName -like 'CA30*' } |
  Select-Object DisplayName, State,
    @{n='SignInRisk';e={$_.Conditions.SignInRiskLevels -join ','}},
    @{n='UserRisk';e={$_.Conditions.UserRiskLevels -join ','}}

# 2. The detection feed is reachable and returning data.
az rest --method get \
  --url "https://graph.microsoft.com/v1.0/identityProtection/riskDetections?\$top=1" \
  --query "value[0].{type:riskEventType, level:riskLevel}" -o table

// 3. Report-only impact: who WOULD have been challenged in the last 14 days?
SigninLogs
| where TimeGenerated > ago(14d)
| mv-expand ca = todynamic(ConditionalAccessPolicies)
| where ca.displayName startswith "CA30"
| where ca.result == "reportOnlyFailure"
| summarize wouldHaveBlocked = dcount(UserPrincipalName), hits = count() by tostring(ca.displayName)

Then run a controlled live test: use a test account with a registered MFA method, generate a Medium+ sign-in (a Tor browser session is the simplest reproducible anonymous-IP trigger in a lab), and confirm Policy A challenges it and the event surfaces in riskDetections and Sentinel within expected latency.

Measuring program health: KPIs and tuning cadence

Run it like a service. The metrics that matter:

Self-remediation rate — % of risky users who cleared via SSPR/MFA without a ticket. Target high; a low rate means MFA/SSPR registration gaps, not a policy problem.
False-positive rate — dismissals / total detections, sliced by detection type. A single noisy type (usually unfamiliar properties or impossible travel) dominating dismissals is a named-location/VPN tuning task.
Mean time to remediate (MTTR) for High user-risk — detection to clean state.
High-risk sign-in success count — the ResultType == 0 query above. Should trend to zero as real-time coverage tightens.

Cadence: review the dismissal mix and the high-risk-success query weekly, revisit named locations and policy scope monthly, and re-baseline report-only impact whenever you change egress architecture or onboard a population that meaningfully shifts the sign-in geography.

Enterprise scenario

A platform team running a 40,000-seat tenant rolled out Policy B (user-risk secure password change) and within two days the helpdesk was drowning — hundreds of users were “successfully” changing passwords, clearing risk, then immediately filing tickets that they could not log into their domain-joined laptops. The constraint: the org was hybrid with Entra Connect, and password writeback had never been enabled because the original deployment predated SSPR. Cloud password changes cleared Identity Protection risk but never propagated to on-prem AD, so Kerberos auth to file shares and domain resources broke for every remediating user.

They fixed it in two moves. First, enable password writeback on the sync engine and grant the connector the directory permissions to reset on-prem passwords:

# Enable password writeback in Entra Connect, then validate.
Set-ADSyncAADPasswordResetConfiguration `
  -Connector "yourtenant.onmicrosoft.com - AAD" `
  -Enable $true

# Confirm both SSPR (writeback) and password hash sync features are on.
Get-MgDirectoryOnPremiseSynchronization |
  Select-Object -ExpandProperty Features

Second — and this was the real lesson — they had skipped the report-only soak. Re-baselining in report-only for two weeks showed the true would-be-remediation volume, which let them stage enforcement by business unit instead of flipping the whole tenant at once. Self-remediation rate went from “everyone calls” to over 90% clearing without a ticket, because by enforcement time writeback worked and the volume was predictable. The takeaway: risk policy is only as good as the remediation path underneath it. Enforce the gate before the self-service plumbing is proven and you have just moved the outage from “attacker” to “helpdesk.”