KQL Threat Hunting Playbooks: MITRE ATT&CK Mapping, UEBA, and Hunting Notebooks

Threat hunting is not “run more queries.” It is a disciplined loop: form a hypothesis grounded in an adversary technique, write a query that would falsify or confirm it, triage the result, and either close the hunt or promote it into a detection. Microsoft Sentinel gives you the substrate — KQL, the Hunts experience, UEBA enrichment, and Jupyter notebooks via MSTICPy — but the value comes from how you wire those into a repeatable playbook. This is the hunting program I stand up for SOC teams that are drowning in alerts and want to get ahead of the queue.

The premise throughout: every hunt starts as a written hypothesis tied to a MITRE ATT&CK technique, every successful hunt ends as a tracked artifact (bookmark, detection, or watchlist), and nothing is one-off. If you cannot re-run it next quarter, it was a demo, not a playbook.

1. Hypothesis-driven hunting: from ATT&CK technique to testable KQL

A hunt hypothesis has three parts: the technique, the observable, and the falsifiable claim. Write it down before you touch KQL.

Hypothesis (T1098.001 - Additional Cloud Credentials): “An attacker who phished an Entra admin would add a credential or federated trust to an app/service principal to persist. If that happened in the last 14 days, I will see Add service principal credentials or certificate/secret additions in AuditLogs from an unusual actor or IP.”

That sentence dictates the query. You are not exploring; you are testing a claim. The technique ID anchors it to ATT&CK so coverage is measurable, and the “if that happened… I will see” clause forces you to name the table and signal up front.

// Hunt: T1098.001 - credentials/secrets added to app registrations & service principals
let lookback = 14d;
AuditLogs
| where TimeGenerated > ago(lookback)
| where OperationName in (
    "Add service principal credentials",
    "Update application - Certificates and secrets management",
    "Add application key",
    "Add federated identity credential")
| extend Actor = tostring(InitiatedBy.user.userPrincipalName)
| extend ActorIp = tostring(InitiatedBy.user.ipAddress)
| extend TargetApp = tostring(TargetResources[0].displayName)
| where isnotempty(Actor)            // exclude system / sync-driven changes
| project TimeGenerated, Actor, ActorIp, TargetApp, OperationName, Result
| sort by TimeGenerated desc

Two habits make this reusable instead of throwaway. First, parameterize the volatile bits with let at the top (lookback, allow-lists, thresholds) so the next hunter tunes one block, not the whole query. Second, project a stable column set (Actor, ActorIp, TargetApp) so downstream notebook enrichment and bookmarking always find the same fields. A playbook is a contract on its output schema as much as its logic.

2. Advanced KQL for hunters: anomaly, time-series, and entity behavior

Hunters live in a different subset of KQL than detection authors. Detections want precision; hunts want to surface the weird. Three families of operators do most of the work.

make-series for time-series shape. Aggregating a metric into an evenly spaced series is the precondition for every statistical function that follows. Note the default=0 and the from/to range — without them, gaps in activity silently distort the series.

SigninLogs
| where TimeGenerated > ago(21d)
| where ResultType == 0                       // successful sign-ins only
| make-series SigninCount = count()
    default = 0
    on TimeGenerated
    from ago(21d) to now()
    step 1h
    by UserPrincipalName

evaluate plugins for unsupervised structure. When you do not know what normal looks like, let KQL find clusters of attributes that co-occur. basket() mines frequent itemsets; autocluster() finds the smallest set of common patterns. Both are excellent for “what is the dominant shape of these failed logons, and what stands outside it.”

SigninLogs
| where TimeGenerated > ago(7d) and ResultType != 0
| project ResultType, AppDisplayName, ClientAppUsed, Location, UserAgent = tostring(parse_json(DeviceDetail).operatingSystem)
| evaluate autocluster()

Entity-centric joins. Hunting is rarely one table. Pivot from a suspect entity (user, host, IP) across telemetry. Use join kind=inner on a normalized key and watch the join order — put the smaller, filtered set on the left.

These are exploratory by design. You expect noise; the skill is shrinking the result to something a human can read in one screen, then deciding whether the residual is benign.

3. Baselining normal with series_decompose and statistical outliers

The core anomaly primitive in KQL is series_decompose_anomalies(). It decomposes a series into seasonal, trend, and residual components, then flags points where the residual exceeds a configurable number of standard deviations. This is how you turn “this user logged in a lot today” into “this is 4 sigma above their own seasonal baseline.”

let sensitivity = 2.5;     // sigma threshold; higher = fewer, stronger anomalies
SigninLogs
| where TimeGenerated > ago(21d)
| where ResultType == 0
| make-series SigninCount = count()
    default = 0 on TimeGenerated from ago(21d) to now() step 1h
    by UserPrincipalName
| extend (anomalies, score, baseline) =
    series_decompose_anomalies(SigninCount, sensitivity, -1, 'linefit')
| mv-expand TimeGenerated to typeof(datetime),
            SigninCount to typeof(long),
            anomalies to typeof(int),
            score to typeof(double)
| where anomalies != 0
| project TimeGenerated, UserPrincipalName, SigninCount, AnomalyScore = score
| sort by AnomalyScore desc

A few points that trip people:

The third argument (-1) lets the function auto-detect seasonality; pass a period (e.g. 24 for hourly data with a daily cycle) when you know it.
series_decompose_anomalies returns three series, so you must mv-expand all of them in lockstep to flatten back to rows. Mismatched mv-expand is the #1 cause of garbage output here.
score is signed — positive is a spike, negative is a dip. A user who suddenly goes silent (negative anomaly on auth) can be as interesting as a spike.

For per-row outlier scoring without full decomposition, series_outliers() gives a Tukey-test-based score and is cheaper when you only care about point anomalies, not seasonality.

4. UEBA tables and identity behavior for insider and ATO hunts

Sentinel’s UEBA does the baselining for you and writes it to queryable tables. Enable it under Settings -> Entity behavior (it requires the relevant data sources, e.g. SigninLogs, AuditLogs, SecurityEvent). The tables you hunt against:

Table	What it gives you
`BehaviorAnalytics`	Per-activity enrichment with `InvestigationPriority` (0-10), peer/blast-radius context, and “first time” flags
`IdentityInfo`	Identity attributes: group membership, manager, account state, on-prem SID, MFA status
`UserPeerAnalytics`	Peer ranking — who this user behaves like, used to spot peer-group outliers
`UserAccessAnalytics`	Effective access / blast radius derived from role and group data

The single most useful field is InvestigationPriority. It folds dozens of behavioral signals into one score, so a high-signal account-takeover (ATO) or insider hunt is often just: rank activities by priority, filter to the unusual ones.

// ATO/insider hunt: high-priority anomalous activity, enriched with identity context
BehaviorAnalytics
| where TimeGenerated > ago(7d)
| where InvestigationPriority > 5
| extend FirstTimeUserConnectedFromCountry =
    tobool(ActivityInsights.FirstTimeUserConnectedFromCountry)
| extend UncommonForUser =
    tobool(ActivityInsights.ActionUncommonlyPerformedByUser)
| where FirstTimeUserConnectedFromCountry or UncommonForUser
| join kind=leftouter (
    IdentityInfo
    | summarize arg_max(TimeGenerated, *) by AccountUPN
    | project AccountUPN, GroupMembership, IsAccountEnabled, Manager
  ) on $left.UserName == $right.AccountUPN
| project TimeGenerated, UserName, ActivityType, SourceIPAddress,
          InvestigationPriority, FirstTimeUserConnectedFromCountry,
          UncommonForUser, GroupMembership, Manager
| sort by InvestigationPriority desc

For insider risk specifically, invert the lens: instead of impossible-travel, hunt for legitimate access that is anomalous for the peer group — a finance analyst suddenly enumerating HR shares. UserPeerAnalytics gives you the peer set; the deviation is the signal. ATO hunts lean on FirstTimeUserConnectedFromCountry, new ASN, and unfamiliar sign-in properties from BehaviorAnalytics.

5. Mapping queries to MITRE ATT&CK and tracking coverage

Coverage is the metric that turns hunting from activity into a program. Every saved hunting query and analytics rule carries ATT&CK tactics and technique IDs as first-class metadata, and you should populate them deliberately.

In a hunting query (the Microsoft.SecurityInsights/huntingQueries / saved-search ARM type) and in scheduled analytics rules, the technique mapping is structured fields, not a comment. Bicep for an analytics rule:

resource credAddRule 'Microsoft.SecurityInsights/alertRules@2024-03-01' = {
  scope: workspace
  name: guid('hunt-T1098-cred-add')
  kind: 'Scheduled'
  properties: {
    displayName: 'Credential added to service principal (T1098.001)'
    severity: 'Medium'
    enabled: true
    query: loadTextContent('queries/t1098-sp-cred-add.kql')
    queryFrequency: 'PT1H'
    queryPeriod: 'P14D'
    triggerOperator: 'GreaterThan'
    triggerThreshold: 0
    tactics: [ 'Persistence', 'PrivilegeEscalation' ]
    techniques: [ 'T1098' ]
    subTechniques: [ 'T1098.001' ]
    entityMappings: [
      {
        entityType: 'Account'
        fieldMappings: [ { identifier: 'FullName', columnName: 'Actor' } ]
      }
    ]
  }
}

tactics, techniques, and subTechniques are the fields the MITRE ATT&CK coverage blade reads. subTechniques arrived later than techniques and is supported on recent alertRules API versions (2023-02-01 and newer) — pin a current @version or the property is silently dropped.

Open Sentinel -> MITRE ATT&CK to see the matrix shaded by your active rules and hunting queries. Treat the gaps as a backlog. Pull coverage programmatically to track it over time rather than eyeballing the heatmap:

# Export technique coverage from active analytics rules
az rest --method get \
  --url "https://management.azure.com/subscriptions/$SUB/resourceGroups/$RG/providers/Microsoft.OperationalInsights/workspaces/$WS/providers/Microsoft.SecurityInsights/alertRules?api-version=2024-03-01" \
  --query "value[].properties.{name:displayName, techniques:techniques, tactics:tactics}" -o table

The honest read: ATT&CK coverage measures intent to detect, not detection efficacy. A green cell with a noisy, never-tuned rule is worse than an empty one. Pair the matrix with a hit-rate review (section 8) so coverage and quality move together.

6. Parameterized hunting notebooks with MSTICPy and enrichment

Notebooks are where a hunt stops being a single query and becomes an investigation. MSTICPy is Microsoft’s open-source hunting library; it wraps Sentinel queries, entity enrichment (VirusTotal, GeoIP, threat intel), and visualization behind a clean API. Sentinel can launch notebooks on an Azure ML compute instance, or run them locally.

Configuration lives in msticpyconfig.yaml (point to it with the MSTICPYCONFIG env var) so connection strings and API keys are never hardcoded in cells:

# msticpyconfig.yaml
AzureSentinel:
  Workspaces:
    Default:
      WorkspaceId: "00000000-1111-2222-3333-444444444444"
      TenantId: "55555555-6666-7777-8888-999999999999"
TIProviders:
  VirusTotal:
    Args:
      AuthKey:
        EnvironmentVar: "VT_AUTH"
    Primary: true
    Provider: "VirusTotal"

A parameterized hunt notebook then reads like this. Parameterizing the entity and lookback is what lets one notebook serve every “investigate this user” hunt:

import msticpy as mp
mp.init_notebook(globals())                 # loads config, query providers, helpers

# --- parameters (override per hunt) ---
target_user = "j.doe@contoso.com"
lookback_days = 14

qry_prov = mp.QueryProvider("MSSentinel")
qry_prov.connect(workspace="Default")        # uses msticpyconfig.yaml

signins = qry_prov.exec_query(f"""
    SigninLogs
    | where TimeGenerated > ago({lookback_days}d)
    | where UserPrincipalName == '{target_user}'
    | project TimeGenerated, IPAddress, AppDisplayName, ResultType, Location
""")

# enrich distinct source IPs against threat intel
ti = mp.TILookup()
ti_results = ti.lookup_iocs(data=signins, ioc_col="IPAddress")
mp.nbdisplay.display_timeline(signins, source_columns=["AppDisplayName", "Location"])

The discipline that makes notebooks pay off: keep the parameter cell at the top, tag it (Sentinel/Papermill can inject parameters into a tagged cell for headless runs), and commit the .ipynb to the same Git repo as your KQL. A hunting notebook is code — it gets reviewed and versioned like code.

7. Operationalizing successful hunts: bookmarks, detections, watchlists

A hunt that found something must leave a durable trace, or the next hunter repeats your work. Three exits, in increasing permanence:

Bookmarks capture an interesting query result row with your notes and entity mappings. They are the unit of evidence — promote a bookmark to an incident when it warrants triage. From a hunting query result, select rows and “Add bookmark”; in the Hunts experience, bookmarks attach to the hunt itself.

Detections are hunts that proved reliable and repeatable. Promote the KQL into a scheduled analytics rule (section 5 Bicep) with tuned thresholds and entity mappings. The bar: low, explainable false-positive rate, and a clear response. If a hunt fires on benign activity half the time, it is not ready to be a detection — keep tuning it as a recurring hunt.

Watchlists externalize the tuning data — VIP accounts, known-good service principals, sanctioned admin IPs, terminated employees — so queries reference a maintained list instead of a hardcoded let. Update the list, not twenty queries.

// Reference a watchlist of sanctioned admin source IPs to suppress known-good
let SanctionedAdminIPs = _GetWatchlist('AdminJumpHosts') | project SearchKey;
SigninLogs
| where TimeGenerated > ago(1d)
| where AppDisplayName == "Azure Active Directory PowerShell"
| where IPAddress !in (SanctionedAdminIPs)      // hunt the residual
| project TimeGenerated, UserPrincipalName, IPAddress, ResultType

Create the watchlist from a CSV via CLI so it lives in source control and deploys with the rest of the workspace:

az sentinel watchlist create \
  --resource-group rg-sec-sentinel \
  --workspace-name law-sentinel-prod \
  --watchlist-alias AdminJumpHosts \
  --display-name "Sanctioned admin jump hosts" \
  --provider "SOC" \
  --source "admin_jump_hosts.csv" \
  --items-search-key "SearchKey"

8. A repeatable hunt cadence, peer review, and documenting findings

The program is the point. A single brilliant hunt that lives in someone’s head is a liability. Run it on a cadence:

Weekly: execute 2-3 hypothesis hunts from the backlog, prioritized by ATT&CK coverage gaps and current threat intel. Each produces a written outcome: confirmed, refuted, or inconclusive (needs more telemetry).
Per hunt: capture the hypothesis, technique ID, KQL, result summary, and decision (bookmark / detection / drop) in a tracked record. Use the Hunts experience — it persists the hypothesis, linked queries, bookmarks, and status — or a lightweight Git-backed markdown log if you want it diffable.
Peer review: another hunter reviews the query logic and the conclusion before a hunt becomes a detection. This catches the two classic failure modes: a query that does not actually test the hypothesis, and a conclusion that overreads noise.
Quarterly: review the ATT&CK matrix against detection hit rates. Retire detections that never fire usefully; convert noisy ones back to hunts; refresh baselines (UEBA and series_decompose baselines drift as the org changes).

Document refuted hypotheses, not just hits. “We hunted T1207 DCShadow over 90 days and found nothing, here is the query” is a real result — it tells the next hunter where the floor already is and stops duplicate effort.

Enterprise scenario

A platform security team at a global insurer had ~140 analytics rules tagged across 60+ ATT&CK techniques — a green-looking matrix the CISO showed the board. But their quarterly purple-team exercise walked an OAuth consent-grant attack (T1528) straight through undetected. The constraint surfaced fast: coverage was being measured as “rule exists,” with no signal on whether rules ever fired or whether anyone tuned them. Several “covered” techniques were backed by rules disabled months earlier during a noise cleanup, and nobody had updated the matrix.

They fixed the measurement, not just the gap. A scheduled MSTICPy notebook (run nightly on an Azure ML compute via Papermill) pulled every rule via the alertRules API, joined enabled state and 90-day fire counts from SecurityAlert, and emitted a coverage report that distinguished intended coverage from effective coverage. Techniques with a rule but zero useful fires in 90 days were flagged amber, not green.

// Effective vs. intended coverage: which tagged techniques actually fired?
SecurityAlert
| where TimeGenerated > ago(90d)
| mv-expand Tactic = todynamic(Tactics)
| extend Technique = extract(@"(T\d{4}(?:\.\d{3})?)", 1, tostring(parse_json(ExtendedProperties)["Techniques"]))
| where isnotempty(Technique)
| summarize Fires = count(), Rules = dcount(AlertName) by Technique
| sort by Fires asc

The amber list became the hunt backlog. The OAuth consent hunt (T1528) that purple team exploited was promoted from notebook to a tuned scheduled rule with a watchlist of approved publishers to suppress sanctioned grants. Six weeks later the matrix was honest — fewer green cells, but every green cell backed by a rule that had fired and been triaged at least once. The board metric changed from “techniques covered” to “techniques covered and validated in the last quarter,” which is the only version of that number worth reporting.

Verify

Confirm the playbook actually works end to end before you call it operational:

// 1. Anomaly query returns rows and a usable score column
SigninLogs
| where TimeGenerated > ago(21d) and ResultType == 0
| make-series c = count() default=0 on TimeGenerated from ago(21d) to now() step 1h by UserPrincipalName
| extend (anom, score, base) = series_decompose_anomalies(c, 2.5)
| mv-expand TimeGenerated to typeof(datetime), c to typeof(long), anom to typeof(int), score to typeof(double)
| where anom != 0
| count        // expect a small, non-zero number on a real tenant

// 2. UEBA is enabled and populating
BehaviorAnalytics | where TimeGenerated > ago(1d) | summarize Rows = count()
// 0 rows after 24h+ means UEBA is off or its source tables aren't connected

# 3. Technique tags are actually set on a rule (not just in a comment)
az rest --method get \
  --url "https://management.azure.com/subscriptions/$SUB/resourceGroups/$RG/providers/Microsoft.OperationalInsights/workspaces/$WS/providers/Microsoft.SecurityInsights/alertRules?api-version=2024-03-01" \
  --query "value[?properties.enabled].properties.techniques" -o tsv | sort -u

# 4. Watchlist resolves in KQL
az sentinel watchlist item list \
  --resource-group rg-sec-sentinel --workspace-name law-sentinel-prod \
  --watchlist-alias AdminJumpHosts -o table

# 5. Notebook connectivity: a one-row query proves auth + workspace wiring
import msticpy as mp
mp.init_notebook(globals())
qp = mp.QueryProvider("MSSentinel"); qp.connect(workspace="Default")
print(qp.exec_query("SigninLogs | take 1"))   # non-empty result = working pipeline

KQL Threat Hunting Playbooks: MITRE ATT&CK Mapping, UEBA, and Hunting Notebooks

1. Hypothesis-driven hunting: from ATT&CK technique to testable KQL

2. Advanced KQL for hunters: anomaly, time-series, and entity behavior

3. Baselining normal with series_decompose and statistical outliers

4. UEBA tables and identity behavior for insider and ATO hunts

5. Mapping queries to MITRE ATT&CK and tracking coverage

6. Parameterized hunting notebooks with MSTICPy and enrichment

7. Operationalizing successful hunts: bookmarks, detections, watchlists

8. A repeatable hunt cadence, peer review, and documenting findings

Enterprise scenario

Verify

Checklist

Written by Vinod

Comments

Keep Reading

Stopping Token Theft: Conditional Access Token Protection and Authentication Context

Defender EASM: Discovering and Reducing Your Internet-Facing Attack Surface

Defender for Cloud Attack Path Analysis: Custom Recommendations and Governance Rules