Microsoft 365 Data & Analytics

Intune Remediations at Scale: Detection and Remediation Scripts, Scheduling, and Drift Correction

Configuration drift is the quiet failure mode of a managed fleet. A profile applies cleanly at enrollment, then a user with local admin, a vendor installer, or a half-removed legacy GPO nudges a setting back out of policy. Configuration profiles re-apply settings they own, but only what the corresponding CSP exposes, and they can’t express conditional logic like “fix this only if a registry value and a service state disagree.” Intune device remediations (formerly Proactive Remediations) fill that gap: a pair of PowerShell scripts, one that detects a problem and one that fixes it, run on a schedule against assigned devices, with fleet-wide reporting. This guide covers the script contract, run context, a real self-healing example, scheduling, reporting, packaging, and Graph-based deployment, then ends with when not to reach for them.

Licensing gate first. Device remediations require one of: Windows 10/11 Enterprise E3 or E5, Windows 10/11 Education A3 or A5, Windows 11 Pro/Enterprise with a Microsoft 365 plan that includes Intune, F3 in some SKUs, or Windows 365. If your tenant lacks the right entitlement, the Remediations blade will be visible but assignments silently produce no data. Confirm entitlement before you invest in authoring.

1. The detect/remediate contract: exit codes, output, and idempotency

A remediation is two scripts bound together. The agent (the Intune Management Extension, or IME) runs the detection script first. Its exit code is the entire decision:

Detection exit code Meaning What happens next
0 No issue found (compliant) Remediation script is not run
non-zero (1 is convention) Issue detected Remediation script runs (if one is assigned)

The remediation script then reports its own outcome the same way:

Remediation exit code Meaning
0 Remediation succeeded
non-zero Remediation failed

Two things trip people up. First, STDOUT is captured and surfaced in the portal, but only a limited amount: the IME keeps up to 2048 characters of standard output per script, and the “Pre-remediation detection output” column shows what the detection script wrote before it exited. Use that channel deliberately. A detection script that writes the offending value to STDOUT and then exits 1 turns the report into an actionable triage list instead of a binary pass/fail.

Second, Write-Error or an unhandled terminating exception will set a non-zero exit implicitly, which means a buggy detection script reports “issue found” and fires your remediation against healthy machines. Always set the exit code explicitly and wrap risky calls:

# detect.ps1 - explicit, defensive contract
try {
    $current = Get-ItemPropertyValue -Path 'HKLM:\SOFTWARE\Contoso\Agent' `
                                      -Name 'HeartbeatSeconds' -ErrorAction Stop
    if ($current -ne 300) {
        Write-Output "DRIFT HeartbeatSeconds=$current expected=300"
        exit 1   # issue -> remediation runs
    }
    Write-Output "OK HeartbeatSeconds=300"
    exit 0       # compliant -> remediation skipped
}
catch {
    # value missing is itself drift; report and remediate
    Write-Output "MISSING HeartbeatSeconds ($($_.Exception.Message))"
    exit 1
}

Idempotency is non-negotiable. Detection runs on every cycle even after a successful fix, so the remediation must converge to the desired state and be safe to run repeatedly without side effects. A remediation that appends to a file, increments a counter, or restarts a service unconditionally will misbehave when it runs hourly. Write remediations that assert state, never toggle it.

2. Run context, 64-bit vs 32-bit, and signature enforcement

Three execution properties on every remediation change behavior far more than the script body does:

Decision rule I use on every remediation: SYSTEM + 64-bit unless I have a specific reason not to. User context is for per-user state; 32-bit is for legacy 32-bit-only apps. Get these wrong and the script “works” in your test session under your admin token, then fails silently across the fleet because SYSTEM has a different registry view and no user hive.

3. Authoring a real example: detecting and self-healing a misconfigured setting

Concrete target: a third-party agent must keep its telemetry endpoint pinned and its service set to Automatic (Delayed Start). A recurring incident is that a reinstall flips the endpoint to a default URL and the start type to Manual. We detect both conditions and heal both.

Detection — exit 1 if either the registry value or the service start type is wrong:

# detect-agent-config.ps1  (SYSTEM, 64-bit)
$ok = $true
$expectedUrl = 'https://telemetry.contoso.net/ingest'

# 1. registry endpoint
try {
    $url = Get-ItemPropertyValue 'HKLM:\SOFTWARE\Contoso\Agent' 'Endpoint' -ErrorAction Stop
    if ($url -ne $expectedUrl) { Write-Output "Endpoint drift: $url"; $ok = $false }
} catch { Write-Output 'Endpoint missing'; $ok = $false }

# 2. service start type (DelayedAutostart lives in the service key)
$svc = Get-Service -Name 'ContosoAgent' -ErrorAction SilentlyContinue
if (-not $svc) {
    Write-Output 'ContosoAgent service not present'; $ok = $false
} else {
    $cfg     = Get-CimInstance Win32_Service -Filter "Name='ContosoAgent'"
    $delayed = (Get-ItemProperty 'HKLM:\SYSTEM\CurrentControlSet\Services\ContosoAgent' `
                                 -Name 'DelayedAutostart' -ErrorAction SilentlyContinue).DelayedAutostart
    if ($cfg.StartMode -ne 'Auto' -or $delayed -ne 1) {
        Write-Output "Service start drift: StartMode=$($cfg.StartMode) Delayed=$delayed"
        $ok = $false
    }
}

if ($ok) { Write-Output 'Agent config compliant'; exit 0 } else { exit 1 }

Remediation — assert the desired state idempotently, then verify the service is actually running:

# remediate-agent-config.ps1  (SYSTEM, 64-bit)
$expectedUrl = 'https://telemetry.contoso.net/ingest'
$svcKey      = 'HKLM:\SYSTEM\CurrentControlSet\Services\ContosoAgent'
try {
    # registry endpoint (create key if absent)
    New-Item -Path 'HKLM:\SOFTWARE\Contoso\Agent' -Force | Out-Null
    Set-ItemProperty -Path 'HKLM:\SOFTWARE\Contoso\Agent' -Name 'Endpoint' `
                     -Value $expectedUrl -Type String

    # Automatic (Delayed Start): set mode to Auto, then DelayedAutostart=1
    Set-Service -Name 'ContosoAgent' -StartupType Automatic -ErrorAction Stop
    Set-ItemProperty -Path $svcKey -Name 'DelayedAutostart' -Value 1 -Type DWord

    # ensure it is running now
    if ((Get-Service 'ContosoAgent').Status -ne 'Running') {
        Start-Service 'ContosoAgent' -ErrorAction Stop
    }
    Write-Output 'Agent config remediated'
    exit 0
}
catch {
    Write-Output "Remediation failed: $($_.Exception.Message)"
    exit 1
}

Note the symmetry: the remediation fixes exactly what the detection checks, and re-running it is harmless. Both scripts are short, single-purpose, and write a one-line status. Save the bodies as separate .ps1 files — the portal and Graph each take the detection and remediation script content independently.

4. Scheduling cadence, assignment targeting, and pre-remediation grace

Create the pack under Devices > Remediations (Intune admin center). After uploading both scripts and setting run context, you assign it to an Entra ID group with a schedule:

Schedule cadence is the detection cadence. Remediation only runs in cycles where detection exits non-zero. A daily pack on a healthy device runs detection once a day and never runs the remediation — so “daily” means “check daily, fix only on the cycles where it’s broken,” not “rewrite the registry daily.” This is why idempotency plus a quiet detection cadence is cheap.

Targeting rules that save you from a fleet-wide incident:

There is no first-class “grace period” knob the way update rings have deadlines. Grace logic lives inside the detection script: if a setting has a legitimate transient window (e.g. a service that restarts itself), bake that tolerance into detection so you don’t remediate a momentary state.

5. Reading remediation status, output columns, and recurrence behavior

Open the pack and read the Device status and Overview. The columns that matter:

Column What it tells you
Detection status Whether detection ran and what it returned (issue / no issue / error)
Remediation status Whether the fix ran and its exit outcome
Pre-remediation detection output The (up to 2048-char) STDOUT your detect script wrote before exiting — your triage payload
Post-remediation detection output STDOUT from the detection re-check after remediation, if applicable
Last run / Last modified When the device last executed the pair

Aggregate tiles classify every device into one of: Without issues (detection passed), With issues (detection found drift), Issues remediated (drift found and fix succeeded), and error states. The healthy steady state for an enforcement pack is most devices “Without issues,” a trickle moving through “Issues remediated.” A growing “With issues, not remediated” count means your remediation is failing — read its STDOUT.

Recurrence behavior worth internalizing: the IME caches the assignment and runs it on the chosen interval locally, with no portal sync each cycle. A device that detects-and-fixes will, on its next cycle, detect clean and report “Without issues” — so a flapping device (fixed, drifts again, fixed) shows as recurring “Issues remediated,” itself a signal that something else is re-breaking the setting faster than you’re fixing it. Force an immediate run for testing with On demand remediations against a single device, or restart the IME service to trigger a re-evaluation:

Restart-Service -Name 'IntuneManagementExtension'
# Agent logs (read these to debug script behavior):
#   C:\ProgramData\Microsoft\IntuneManagementExtension\Logs\IntuneManagementExtension.log
#   C:\ProgramData\Microsoft\IntuneManagementExtension\Logs\AgentExecutor.log

6. Packaging files alongside scripts and avoiding payload-size traps

Remediations are scripts, not packages — there is no content payload like a Win32 app’s .intunewin. If your remediation needs a binary, a config file, or a large data blob, you have two honest options:

  1. Embed it in the script as a base64 string and write it to disk at remediation time. This works for genuinely small artifacts only. Keep each script comfortably small — Microsoft’s guidance is to stay well under the script size limit (treat 200 KB per script as a practical ceiling), and remember STDOUT capture is just 2048 characters, so don’t echo the blob.
  2. Fetch it at runtime from a trusted location (an authenticated Azure Storage SAS URL, an internal share reachable by the device) and verify it before use. The script becomes a bootstrapper; the bytes live elsewhere.
# remediation that materializes a small config from embedded base64 (idempotent)
$target = 'C:\ProgramData\Contoso\agent.conf'
$b64    = 'eyJlbmRwb2ludCI6Imh0dHBzOi8vdGVsZW1ldHJ5LmNvbnRvc28ubmV0L2luZ2VzdCJ9'
$want   = [Convert]::FromBase64String($b64)

$need = $true
if (Test-Path $target) {
    $have = [IO.File]::ReadAllBytes($target)
    if (@(Compare-Object $have $want -SyncWindow 0).Count -eq 0) { $need = $false }
}
if ($need) {
    New-Item (Split-Path $target) -ItemType Directory -Force | Out-Null
    [IO.File]::WriteAllBytes($target, $want)
    Write-Output 'config written'
} else { Write-Output 'config already correct' }
exit 0

The size trap: people paste a whole installer as base64 into a remediation, blow past the limit, and the upload either rejects or the script fails to parse on the device. If you’re embedding more than a few KB, you’ve picked the wrong tool — use a Win32 app for payloads and let the remediation only enforce a small piece of state.

7. Deploying and versioning remediation packs through Graph

For repeatable, reviewed deployment, drive remediations as code through Microsoft Graph. The resource is deviceManagement/deviceHealthScripts; script bodies are base64-encoded in detectionScriptContent and remediationScriptContent:

# create a remediation pack (detection + remediation), SYSTEM, 64-bit, unsigned
DET=$(base64 -i detect-agent-config.ps1)
REM=$(base64 -i remediate-agent-config.ps1)

m365 request --method post \
  --url "https://graph.microsoft.com/beta/deviceManagement/deviceHealthScripts" \
  --header 'Content-Type: application/json' \
  --body "{
    \"@odata.type\": \"#microsoft.graph.deviceHealthScript\",
    \"displayName\": \"Contoso Agent config enforcement\",
    \"description\": \"Pins endpoint + Automatic(Delayed) service start\",
    \"publisher\": \"Platform Engineering\",
    \"version\": \"1.4.0\",
    \"runAsAccount\": \"system\",
    \"runAs32Bit\": false,
    \"enforceSignatureCheck\": false,
    \"detectionScriptContent\": \"${DET}\",
    \"remediationScriptContent\": \"${REM}\"
  }"

Assign it with a schedule. The assignment carries a runSchedule (here, daily at 03:00 device-local) and the target group:

m365 request --method post \
  --url "https://graph.microsoft.com/beta/deviceManagement/deviceHealthScripts/<scriptId>/assign" \
  --header 'Content-Type: application/json' \
  --body '{
    "deviceHealthScriptAssignments": [{
      "target": {
        "@odata.type": "#microsoft.graph.groupAssignmentTarget",
        "groupId": "<entraGroupId>"
      },
      "runRemediationScript": true,
      "runSchedule": {
        "@odata.type": "#microsoft.graph.deviceHealthScriptDailySchedule",
        "interval": 1,
        "time": "03:00:00",
        "useUtc": false
      }
    }]
  }'

Set "runRemediationScript": false to ship the same pack in audit-only mode first. For versioning, keep the .ps1 files in Git, bump the version string on every change (it is free-text but invaluable in the report and for change tracking), and pull aggregate state back for dashboards:

# fleet roll-up for one pack
m365 request --method get \
  --url "https://graph.microsoft.com/beta/deviceManagement/deviceHealthScripts/<scriptId>/deviceRunStates?\$expand=managedDevice(\$select=deviceName)"

deviceRunStates gives per-device detectionState, remediationState, and the captured output strings — exactly the columns the portal renders, in a form you can ship to a SIEM or a weekly report.

8. When to use remediations vs configuration profiles vs Win32 apps

Choosing the right mechanism is most of the skill. Remediations are powerful precisely because they run arbitrary code as SYSTEM on a schedule — which is also why they are the wrong default.

Need Use Why
Set a setting a CSP already exposes (BitLocker, firewall, password policy) Configuration profile / Settings Catalog Native, declarative, auto-re-applies, no script risk
Conditional “fix only if X and Y disagree,” cross-checking registry + service + file Remediation Only mechanism with real detection logic and recurring self-heal
Install/update/remove software, ship a real payload Win32 app Built for content delivery, supersedence, detection rules, dependencies
One-time bootstrap at enrollment, no recurrence needed Platform script (device management script) Runs once; no scheduled re-evaluation or reporting
Continuous compliance signal feeding Conditional Access Compliance policy Stamps isCompliant; remediations do not affect compliance state

The line that matters: if a configuration profile can own the setting, let it. Profiles re-apply on their own check-in and carry no code-execution risk. Reach for a remediation when you need logic — a decision spanning multiple signals, or a fix the CSP surface can’t express. And never use a remediation to install software; that path has no payload model, no supersedence, and no clean detection-rule story.

Verify

Confirm the pack actually detects and heals before you widen the assignment:

  1. Force a run on a canary. Use On demand remediations in the portal for one enrolled device, or Restart-Service IntuneManagementExtension on the box.
  2. Break it on purpose. On the test device, set the registry value wrong and flip the service to Manual, then trigger detection.
  3. Read the report. In the pack’s Device status, confirm the device shows With issues then Issues remediated, and that Pre-remediation detection output contains your drift string.
  4. Re-detect clean. On the next cycle the same device should report Without issues with no remediation run — proof the fix converged and detection agrees.
  5. Check the logs on failure. AgentExecutor.log shows the script’s exit code and STDOUT; IntuneManagementExtension.log shows scheduling and policy pickup.
  6. Validate context. If a 64-bit registry path “won’t stay fixed,” confirm runAs32Bit is false — 32-bit redirection is the usual culprit.

Enterprise scenario

A platform team running ~28,000 Windows endpoints had a recurring break: their EDR sensor’s tamper-protection registry flag was being cleared on a subset of machines after a vendor agent self-update, silently degrading protection. A configuration profile couldn’t help — the value lived under a vendor key with no CSP, and worse, the flag was valid to be 0 for ~90 seconds during the agent’s own restart, so a naive “set it to 1” remediation would fight the agent and cause restart loops.

The constraint was the transient window. They built a detection script that read the flag and the agent service’s last-start time, and only declared drift if the flag was 0 and the service had been stable (running) for more than five minutes — encoding the grace tolerance directly in detection. Remediation asserted the flag, never touching the service. They shipped it audit-only first (runRemediationScript: false) for a week, used the deviceRunStates roll-up to confirm ~3% of devices were genuinely drifting (not flapping in the restart window), then enabled remediation on an hourly cadence given the security stakes.

# detection core: drift only if flag is off AND agent has been stable > 5 min
$flag = (Get-ItemProperty 'HKLM:\SOFTWARE\Vendor\EDR' 'TamperProtect' `
         -ErrorAction SilentlyContinue).TamperProtect
$svc  = Get-CimInstance Win32_Service -Filter "Name='EdrSensor'"
$proc = if ($svc.ProcessId) { Get-Process -Id $svc.ProcessId -ErrorAction SilentlyContinue }
$stableMins = if ($proc) { ((Get-Date) - $proc.StartTime).TotalMinutes } else { 0 }

if ($flag -ne 1 -and $stableMins -gt 5) {
    Write-Output "TamperProtect=$flag stableMins=$([int]$stableMins)"
    exit 1
}
exit 0

The hourly cap on MTTR closed the protection gap to under an hour, audit-mode proved the fix wouldn’t loop before any device self-healed, and the grace-in-detection pattern became their template for every remediation touching a setting another agent also writes.

Checklist

IntuneRemediationsPowerShellConfiguration DriftEndpoint Analytics

Comments

Keep Reading