Every Conditional Access (CA) policy you write is a lock you could one day be standing on the wrong side of. A bad CA deployment, an expired federation certificate on your identity provider, or an outage in your MFA service can lock every administrator out of the tenant simultaneously — including the people who would fix it. Break-glass accounts are the deliberate exception: a tiny number of cloud-only Global Administrators, excluded from the controls that could fail closed, hardened so hard that the only realistic way to use one is the documented emergency, and wired so that any use of one is screaming at your on-call within minutes. This article is the build, end to end. It assumes Global Administrator to create the accounts and assign roles, Conditional Access Administrator to manage exclusions, and a Log Analytics workspace (plus Microsoft Sentinel if you want the near-real-time rule).
Microsoft’s own guidance is two break-glass accounts minimum. Two is the floor, not a target: one covers single-account compromise or a fat-fingered credential reset, the second covers the first being mid-rotation. More than two or three and you are just expanding attack surface.
1. Why these accounts exist: the three failure modes
Be precise about what you are insuring against, because it dictates every design choice below. Three classes of failure can lock out your normal admins at once:
- Conditional Access self-lockout. A policy targeting “All users” with a grant control nobody can satisfy — require compliant device on a day Intune is degraded, require an authentication strength no admin has enrolled, or a misconfigured block. CA evaluates as a logical AND across matching policies, so one bad policy denies the sign-in regardless of what else is correct.
- Federation / external IdP outage. If your tenant is federated (AD FS, or a third-party IdP via SAML/WS-Fed) or you depend on an external MFA provider, an expired token-signing certificate or a provider outage breaks authentication for every federated user. A cloud-only account that authenticates directly against Entra ID does not traverse that broken path.
- MFA service or method failure. A regional outage of the MFA service, or a method-specific failure (push notifications down, a telco SMS outage), strands accounts that can only satisfy that one method.
The design that survives all three is the same: cloud-only (no federation dependency), authenticating with a phishing-resistant credential the account itself holds (no external MFA dependency), and excluded from the CA policies that could fail closed. Resilience and exclusion are two sides of one coin.
2. Account design: cloud-only, named, licensed, and Global Admin
Create the accounts in the tenant’s default .onmicrosoft.com domain, not a custom domain. Custom domains can be federated or have their DNS lapse; .onmicrosoft.com is managed by Microsoft and always resolves to managed (cloud) authentication.
Connect-MgGraph -Scopes "User.ReadWrite.All","RoleManagement.ReadWrite.Directory","Directory.ReadWrite.All"
$tenant = (Get-MgOrganization).VerifiedDomains | Where-Object { $_.Name -like "*.onmicrosoft.com" } | Select-Object -First 1
$domain = $tenant.Name # e.g. contoso.onmicrosoft.com
1..2 | ForEach-Object {
$upn = "bg-admin-0$_@$domain"
New-MgUser -AccountEnabled `
-DisplayName "BREAK GLASS - Emergency Access 0$_" `
-UserPrincipalName $upn `
-MailNickname "bg-admin-0$_" `
-PasswordProfile @{
Password = ([System.Web.Security.Membership]::GeneratePassword(40,8))
ForceChangePasswordNextSignIn = $false
} `
-UsageLocation "GB"
}
Design decisions that matter:
- Naming. Make them unmistakable:
bg-admin-01, display name shoutingBREAK GLASS. You want these to stand out in audit logs and the user list — obscurity here helps an attacker, not you. ForceChangePasswordNextSignIn = $false. If the account is forced to change its password on first sign-in, the emergency itself becomes a password-reset workflow at the worst possible moment. Set the password once, at high entropy, and govern it out-of-band.- No mailbox, no group memberships, no app role assignments. The account exists to hold a role and nothing else. Excess membership is excess blast radius.
- Licensing. The account needs Entra ID P1 functionality to be excluded from CA cleanly and to be covered by sign-in log retention you can alert on. The cleanest pattern is to assign P2 (or P1) directly so the account is never dependent on a group-based licensing rule that could be edited. You can assign a license, or rely on tenant-level coverage — but never make the break-glass account’s protection contingent on a dynamic group it might fall out of.
Now assign Global Administrator as a permanent active assignment — not PIM-eligible.
$role = Get-MgDirectoryRole -Filter "displayName eq 'Global Administrator'"
if (-not $role) {
$tmpl = Get-MgDirectoryRoleTemplate -Filter "displayName eq 'Global Administrator'"
$role = New-MgDirectoryRole -RoleTemplateId $tmpl.Id
}
foreach ($u in @("bg-admin-01@$domain","bg-admin-02@$domain")) {
$user = Get-MgUser -UserId $u
New-MgDirectoryRoleMemberByRef -DirectoryRoleId $role.Id `
-BodyParameter @{ "@odata.id" = "https://graph.microsoft.com/v1.0/directoryObjects/$($user.Id)" }
}
This is the one place you deliberately violate least-privilege and just-in-time. PIM activation depends on the very systems (CA, MFA, the PIM service) that a break-glass account exists to bypass. If activation requires MFA and MFA is down, your “emergency” account cannot be activated. Keep break-glass as standing Global Admin and compensate with hardening and monitoring, not JIT.
3. Excluding break-glass from Conditional Access without leaving holes
The exclusion is the most dangerous part of the whole design, because a clumsy exclusion is itself the hole. Two rules:
Exclude a group, never the user objects directly. Put both break-glass accounts in one assigned security group — SG-BreakGlass-Exclude — and exclude that group from policy. This gives you one place to manage membership and one object to audit.
$grp = New-MgGroup -DisplayName "SG-BreakGlass-Exclude" -MailEnabled:$false `
-MailNickname "sg-breakglass-exclude" -SecurityEnabled:$true `
-Description "BREAK GLASS exclusion group - monitored - do not add members"
foreach ($u in @("bg-admin-01@$domain","bg-admin-02@$domain")) {
$user = Get-MgUser -UserId $u
New-MgGroupMemberByRef -GroupId $grp.Id `
-BodyParameter @{ "@odata.id" = "https://graph.microsoft.com/v1.0/directoryObjects/$($user.Id)" }
}
Use an assigned group, not dynamic. A dynamic membership rule is code that can break or be edited to silently empty (or balloon) the group; assigned membership changes are explicit and audited.
Exclude from the policies that can fail closed — and only those. The break-glass group belongs in the exclude block of every CA policy that could deny all admins: the require-MFA policies, require-compliant-device policies, authentication-strength policies, and any block policy with a broad scope. Add the exclusion to a single representative policy first, confirm the structure, then propagate. Here is the pattern (a require-MFA policy for admins, with the exclusion in place):
Connect-MgGraph -Scopes "Policy.ReadWrite.ConditionalAccess","Policy.Read.All"
$bgGroupId = (Get-MgGroup -Filter "displayName eq 'SG-BreakGlass-Exclude'").Id
$params = @{
displayName = "CA101-Admins-Require-PhishResistant-MFA"
state = "enabledForReportingButNotEnforced" # ship in report-only first
conditions = @{
users = @{
includeRoles = @("62e90394-69f5-4237-9190-012177145e10") # Global Administrator template ID
excludeGroups = @($bgGroupId)
}
applications = @{ includeApplications = @("All") }
}
grantControls = @{
operator = "OR"
authenticationStrength = @{ id = "00000000-0000-0000-0000-000000000004" } # built-in Phishing-resistant MFA
}
}
New-MgIdentityConditionalAccessPolicy -BodyParameter $params
There is one nuance people miss: do not exclude break-glass from the policy that blocks legacy authentication. Break-glass accounts should never use legacy auth (basic protocols cannot present a FIDO2 key anyway), so leaving them in scope of the legacy-auth block costs you nothing and removes a downgrade path. Exclude from controls that could lock you out, not from controls that only ever help. The exclusion list is: MFA, device compliance, auth strength, sign-in risk, broad blocks. Keep them inside: legacy-auth block, and any block on disabled/legacy protocols.
4. Credential hardening: FIDO2, long passphrases, split custody
A standing Global Admin with a password is a liability. Harden in three layers.
Primary credential: FIDO2 security keys, registered to each account. Phishing-resistant, hardware-bound, and — critically — the credential lives on the key the operator physically holds, so it has no dependency on an external MFA service. Register at least one key per account (two per account if you can, so a lost key is not a lockout). Enable the FIDO2 method tenant-wide and, if you enforce attestation, make sure the vendor AAGUIDs of your emergency keys are in the allow-list — an attestation policy that excludes your break-glass keys will block the very sign-in you need.
# Ensure FIDO2 is enabled and attestation settings won't reject the emergency keys
Get-MgPolicyAuthenticationMethodPolicyAuthenticationMethodConfiguration -AuthenticationMethodConfigurationId "Fido2" |
Select-Object -ExpandProperty AdditionalProperties
Registration of a hardware key requires the operator to complete the WebAuthn ceremony in person at https://aka.ms/mysecurityinfo (or https://mysignins.microsoft.com/security-info) while signed in as the break-glass account. This is a deliberate, supervised, in-person step — exactly the custody event you want.
Fallback credential: a long passphrase, split. Keep the password as a break-the-glass-on-the-break-glass fallback in case the FIDO2 ecosystem itself is what failed. Generate 30+ characters of real entropy, then apply split-knowledge custody: no single person ever holds the whole secret, and using the account requires two people to combine their halves.
# 40-char base password, then split into two halves held by different custodians
PW=$(openssl rand -base64 30 | tr -dc 'A-Za-z0-9!@#%^*_-' | head -c 40)
echo "Half A (custodian 1): ${PW:0:20}"
echo "Half B (custodian 2): ${PW:20:20}"
Each half goes into a sealed, tamper-evident envelope, stored in a physical safe (ideally two safes in two locations), and the act of opening an envelope is itself an audited event. Some teams instead store the full secret in a dedicated, heavily-restricted vault (a separate Key Vault or a password manager break-glass record) with its own monitored access — that is acceptable if access to that store does not depend on the same Entra tenant you are trying to recover. A break-glass password locked inside an app that requires Entra SSO to open is not a break-glass password.
Disable per-method weak fallbacks. Do not register SMS or voice on these accounts — they reintroduce telco and external-service dependencies and are phishable. FIDO2 plus the sealed passphrase is the entire credential surface.
5. Alerting on any sign-in: Log Analytics and a Sentinel NRT rule
The deal with break-glass is simple: it is allowed to exist only because every use of it is loud. Route Entra sign-in logs to Log Analytics, then alert on any sign-in by these accounts.
First, confirm the diagnostic setting ships SignInLogs to your workspace:
az monitor diagnostic-settings create \
--name "entra-to-law" \
--resource "/providers/microsoft.aadiam/diagnosticSettings" \
--logs '[{"category":"SignInLogs","enabled":true},{"category":"AuditLogs","enabled":true},{"category":"NonInteractiveUserSignInLogs","enabled":true}]' \
--workspace "/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.OperationalInsights/workspaces/<law-name>"
The detection query — match on the break-glass UPN prefix so a renamed or second account is still caught, and surface enough context for triage:
let bgPrefix = "bg-admin-";
SigninLogs
| where TimeGenerated > ago(15m)
| where UserPrincipalName startswith bgPrefix
| project
TimeGenerated,
UserPrincipalName,
ResultType, // 0 == success
ResultDescription,
IPAddress,
Location = strcat(tostring(LocationDetails.city), ", ", tostring(LocationDetails.countryOrRegion)),
AppDisplayName,
ClientAppUsed,
ConditionalAccessStatus
| order by TimeGenerated desc
Two ways to fire on this. If you have Sentinel, use a near-real-time (NRT) analytics rule — NRT rules run roughly every minute, the lowest latency Sentinel offers, which is what you want for an account that should never sign in unannounced:
{
"kind": "NRT",
"properties": {
"displayName": "Break-glass account sign-in",
"description": "Any sign-in by a break-glass emergency-access account. Page on-call immediately.",
"severity": "High",
"enabled": true,
"query": "let bgPrefix = \"bg-admin-\";\nSigninLogs\n| where UserPrincipalName startswith bgPrefix\n| extend AccountCustomEntity = UserPrincipalName, IPCustomEntity = IPAddress",
"suppressionEnabled": false,
"tactics": ["InitialAccess","PrivilegeEscalation"],
"incidentConfiguration": {
"createIncident": true,
"groupingConfiguration": { "enabled": false }
}
}
}
NRT rules have constraints worth knowing: one table per rule, no join across tables, and the query is shaped to run incrementally — so keep it to the single SigninLogs filter above. Without Sentinel, a scheduled query alert rule in Azure Monitor on the same KQL works; set it to the minimum evaluation frequency (5 minutes) and wire the action group to your pager and an out-of-band channel (SMS/Teams), because if the incident is a tenant-wide identity outage, email may be exactly what is down.
Alert on success and failure — do not filter on
ResultType == 0. A failed break-glass sign-in is arguably the more interesting signal: either an operator is fumbling a real emergency, or an attacker is probing the account. Both warrant a page.
6. Periodic validation: scheduled test sign-ins and rotation
An emergency procedure you have never run is a hypothesis, not a control. Validate on a cadence:
- Quarterly test sign-in. A custodian performs a full FIDO2 sign-in to each account, confirms Global Admin is effective (open a blade only Global Admin can), and — this is the point — confirms the alert fired end to end and paged the right people. A break-glass account whose alert silently stopped working is worse than no break-glass account, because you think you are covered. Schedule this as a calendar-bound runbook task with a sign-off.
- Credential rotation after any use, and on a fixed cadence. Rotate the passphrase immediately after any real or test use, and otherwise at least every 90–180 days. Re-split and re-seal the envelopes; log the rotation. FIDO2 keys do not “rotate” the same way, but inventory them each cycle and confirm each registered key is physically accounted for.
- Membership and exclusion drift check. Re-confirm the exclusion group still contains exactly the two accounts and that no new CA policy was shipped without the exclusion (covered in Verify below).
7. Auditing and access reviews against silent privilege drift
The slow failure mode is drift: someone adds a third account to the exclusion group “temporarily,” or a new CA policy goes live without the break-glass exclusion, or a role assignment quietly changes. Catch it two ways.
An access review on the exclusion group, recurring monthly, that forces a human to re-justify every member. Because the group should be static, the review is fast and any unexpected member is glaring.
Connect-MgGraph -Scopes "AccessReview.ReadWrite.All"
$grp = Get-MgGroup -Filter "displayName eq 'SG-BreakGlass-Exclude'"
$review = @{
displayName = "Monthly review - break-glass exclusion membership"
descriptionForAdmins = "Confirm exactly the two break-glass accounts remain. Remove anything else."
scope = @{
"@odata.type" = "#microsoft.graph.accessReviewQueryScope"
query = "/groups/$($grp.Id)/transitiveMembers"
queryType = "MicrosoftGraph"
}
reviewers = @(@{ query = "/users/<identity-governance-owner-id>"; queryType = "MicrosoftGraph" })
settings = @{
mailNotificationsEnabled = $true
reminderNotificationsEnabled = $true
defaultDecisionEnabled = $false # never auto-approve break-glass
recurrence = @{
pattern = @{ type = "absoluteMonthly"; interval = 1; dayOfMonth = 1 }
range = @{ type = "noEnd"; startDate = (Get-Date -Format "yyyy-MM-dd") }
}
}
}
New-MgIdentityGovernanceAccessReviewDefinition -BodyParameter $review
Audit-log alerting on the group and roles. Beyond sign-ins, alert when anyone is added to the exclusion group or when Global Admin membership changes — those are the directory-mutation events that precede a real compromise.
AuditLogs
| where TimeGenerated > ago(1h)
| where OperationName in ("Add member to group", "Add member to role")
| mv-expand tr = TargetResources
| extend target = tostring(tr.displayName), modifiedProps = tr.modifiedProperties
| where modifiedProps has "SG-BreakGlass-Exclude"
or modifiedProps has "Global Administrator"
or target startswith "bg-admin-"
| project TimeGenerated, OperationName, Initiator = tostring(InitiatedBy.user.userPrincipalName), target
Verify
Prove the control works before you trust it. Run all of these.
- Exclusion coverage is complete. Enumerate every enabled CA policy and confirm the break-glass group is excluded from each one that could fail closed — and flag any that omit it:
$bgGroupId = (Get-MgGroup -Filter "displayName eq 'SG-BreakGlass-Exclude'").Id
Get-MgIdentityConditionalAccessPolicy -All |
Where-Object { $_.State -eq "enabled" } |
ForEach-Object {
[pscustomobject]@{
Policy = $_.DisplayName
Excluded = ($_.Conditions.Users.ExcludeGroups -contains $bgGroupId)
}
} | Sort-Object Excluded | Format-Table -AutoSize
Review the Excluded = False rows by hand: a block-legacy-auth policy should be False; an MFA or compliant-device policy that is False is a lockout waiting to happen.
-
The alert actually pages. Do a real FIDO2 sign-in as
bg-admin-01, then confirm the Sentinel incident (or Azure Monitor alert) was created within a couple of minutes and reached the pager and the out-of-band channel. Latency and delivery are the test, not just rule existence. -
Cloud-only authentication holds. Confirm the account’s UPN is on
*.onmicrosoft.comand that its sign-in showsConditionalAccessStatusofnotAppliedfor the excluded policies — proof the exclusion engaged rather than the policy simply not matching. -
Resilience drill (report-only first). Before enforcing a new broad CA policy, ship it as
enabledForReportingButNotEnforcedand use the What If tool / report-only insights to confirm the break-glass accounts evaluate to not blocked. Only then flip toenabled.
Enterprise scenario
A global insurer ran AD FS federation for their primary .com domain and used a third-party MFA provider in front of it. During a planned AD FS certificate rollover, the new token-signing certificate was published to the federation metadata but the corresponding trust update on the Entra side lagged — for about forty minutes, every federated sign-in failed with a token-signature error. Normal admins, all on the federated domain, were locked out. The on-call SRE pulled the sealed envelope halves from two safes, combined the passphrase, and signed in as bg-admin-01 on contoso.onmicrosoft.com — managed authentication, no federation path, no third-party MFA — and used standing Global Admin to roll the federation trust back. The break-glass sign-in tripped the Sentinel NRT rule and paged the security duty manager within ninety seconds, who confirmed (via the documented runbook) that this was the sanctioned recovery rather than an intrusion, and annotated the incident.
The fix they hardened afterward was the gap the drill exposed: their second break-glass account’s FIDO2 key had been registered to a key that was later wiped during a desk move, so only one account was actually usable. They added an inventory step to the quarterly validation and a KQL check that asserts each break-glass account has a registered FIDO2 method:
// Run alongside quarterly validation: flag any break-glass account
// whose last successful sign-in was NOT via a phishing-resistant method.
SigninLogs
| where UserPrincipalName startswith "bg-admin-"
| where ResultType == 0
| summarize arg_max(TimeGenerated, AuthenticationDetails) by UserPrincipalName
| extend methods = tostring(AuthenticationDetails)
| where methods !has "FIDO2" and methods !has "Passkey"
| project UserPrincipalName, TimeGenerated, methods
The lesson generalizes: the failure that bites you is never the one you modeled, it is the second account you assumed was fine. Drill both.