Servers Automation

Configuration Management for Windows Server with PowerShell DSC and Ansible

Two tools dominate declarative Windows configuration, and most teams pick one for the wrong reasons. PowerShell DSC gives you an agent (the Local Configuration Manager) that re-applies state on a schedule with no controller required - excellent for fleets that must self-heal even when the orchestrator is unreachable. Ansible gives you agentless push from a control node, a vast win_ module library, and orchestration across mixed Linux/Windows estates. They are not competitors; the mature pattern is Ansible as the orchestration plane that ships and triggers DSC for the resources where DSC is genuinely better. This walkthrough builds that combined model end to end - authoring configurations, tuning the LCM, writing idempotent roles, remediating drift, and gating the whole thing behind Pester in CI.

Scope: Windows Server 2019/2022/2025, PowerShell DSC 1.1 (the in-box PSDesiredStateConfiguration that ships with Windows PowerShell 5.1). This is the version the WMI/CIM-based LCM understands. The cross-platform rewrite (DSC 3.0, a standalone Rust executable) is a different runtime with no LCM and no MOF pull server, so do not mix its docs into a 5.1 deployment. Ansible examples assume a recent ansible-core with the ansible.windows and community.windows collections installed.

1. DSC concepts: resources, MOF, and the LCM

Three pieces. Resources are PowerShell modules that know how to test and set one kind of state - a Windows feature, a service, a registry value. Each exposes Get, Test, and Set; the contract is that Test returns a boolean and Set only runs when Test reports drift. That Test-before-Set gate is where idempotency actually lives.

A configuration is a PowerShell function (keyword Configuration) that declares the resources and their target values. Compiling it produces a MOF (Managed Object Format) document per node - a flat, declarative artifact with no PowerShell logic in it. The MOF is what gets applied; the script that generated it is build-time only.

The LCM (Local Configuration Manager) is the agent baked into every Windows box. It ingests a MOF, calls each resource’s Test, applies Set where needed, and - critically - re-runs on a timer to correct drift. Inspect it:

Get-DscLocalConfigurationManager
# Key fields: RefreshMode (Push|Pull), ConfigurationMode,
# ConfigurationModeFrequencyMins, RebootNodeIfNeeded

Mental model: the configuration is your source of truth, the MOF is the compiled binary, and the LCM is the runtime that enforces it. Drift correction is the LCM’s job, not yours.

Confirm the resources available on the box before you author against them:

Get-DscResource | Select-Object Name, ModuleName, Version
Install-Module -Name PSDscResources, NetworkingDsc -Scope AllUsers

PSDscResources is the supported successor to the old in-box PSDesiredStateConfiguration resources - prefer it for Service, Registry, WindowsFeature, and friends.

2. Authoring a configuration for roles, services, and registry

Here is a real baseline: install IIS, guarantee the W3SVC service is running and automatic, and pin a registry value (disable the legacy SMBv1 server). Parameterize the node list so the same configuration compiles for every host.

Configuration WebBaseline {
    param([string[]]$NodeName = 'localhost')

    Import-DscResource -ModuleName PSDscResources -ModuleVersion 2.12.0.0

    Node $NodeName {
        WindowsFeature IIS {
            Name   = 'Web-Server'
            Ensure = 'Present'
        }

        Service W3SVC {
            Name        = 'W3SVC'
            State       = 'Running'
            StartupType = 'Automatic'
            DependsOn   = '[WindowsFeature]IIS'
        }

        Registry DisableSmb1Server {
            Key       = 'HKLM:\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters'
            ValueName = 'SMB1'
            ValueData = '0'
            ValueType = 'Dword'
            Ensure    = 'Present'
        }
    }
}

# Compile -> emits .\WebBaseline\<NodeName>.mof
WebBaseline -NodeName 'WEB01','WEB02' -OutputPath .\WebBaseline

DependsOn enforces ordering: the service resource will not run until the feature is present. The MOF lands as <NodeName>.mof per node. Apply it locally to validate before you wire up any delivery model:

Start-DscConfiguration -Path .\WebBaseline -Wait -Verbose -Force
Test-DscConfiguration -Detailed   # InDesiredState:$true means no drift

Keep secrets out of plaintext MOF. DSC supports certificate-encrypted credentials via a configuration-data block referencing a CertificateFile and Thumbprint; never embed a PSCredential in a MOF without it, because the MOF is stored on disk in clear text otherwise.

3. Push versus pull, and configuring the LCM

The LCM runs in one of two RefreshMode values, and the choice drives your whole operational model.

Push Pull
Delivery Start-DscConfiguration sends the MOF LCM fetches MOF + modules from a server
Scale Imperative, one-to-many from a controller Self-service, fleet polls independently
Drift correction Only on next push, or LCM consistency check LCM re-pulls and re-applies on its own timer
Failure mode Controller down = no new config Pull server down = nodes keep last-known-good

Configure the LCM with a meta-configuration - a separate Configuration decorated with [DSCLocalConfigurationManager()]. This sets ApplyAndAutoCorrect, which is what makes the agent remediate drift on every consistency check rather than merely report it:

[DSCLocalConfigurationManager()]
Configuration LcmAutoCorrect {
    Node localhost {
        Settings {
            RefreshMode                    = 'Push'
            ConfigurationMode              = 'ApplyAndAutoCorrect'
            ConfigurationModeFrequencyMins = 30
            RebootNodeIfNeeded             = $false
            ActionAfterReboot              = 'ContinueConfiguration'
        }
    }
}

LcmAutoCorrect -OutputPath .\LcmMeta
Set-DscLocalConfigurationManager -Path .\LcmMeta -Verbose

The three ConfigurationMode values matter:

ConfigurationModeFrequencyMins has a documented floor of 15 minutes. Setting it lower is silently clamped. Thirty is a sane production default - frequent enough to close drift windows, infrequent enough to avoid churn.

For a pull server, set RefreshMode = 'Pull' and add a ConfigurationRepositoryWeb block with the server URL and a registration key. Note that Microsoft has deprecated the Windows-hosted DSC Pull Server feature - for greenfield pull at scale, Azure Automation State Configuration or a community pull endpoint is the supported direction, but the on-box push and ApplyAndAutoCorrect model below is fully supported and is what I lean on when pairing DSC with Ansible.

4. Managing Windows from Ansible over WinRM and PSRP

Ansible reaches Windows two ways. The default winrm connection plugin speaks WS-Management; the newer psrp plugin uses PowerShell Remoting Protocol over WinRM and is faster for multi-task plays because it reuses a single runspace. Both ride on the same WinRM listener.

First, enable WinRM on the target. The canonical bootstrap is Microsoft’s ConfigureRemotingForAnsible.ps1, but for production you want an HTTPS listener with a real certificate, not the script’s self-signed default:

# On the Windows host - create an HTTPS listener bound to a known cert
$cert = Get-ChildItem Cert:\LocalMachine\My |
        Where-Object Subject -eq 'CN=web01.corp.example.com'
New-Item -Path WSMan:\localhost\Listener -Transport HTTPS `
         -Address * -CertificateThumbPrint $cert.Thumbprint -Force
Set-Item WSMan:\localhost\Service\Auth\Basic        $false
Set-Item WSMan:\localhost\Service\Auth\CredSSP       $false
New-NetFirewallRule -DisplayName 'WinRM HTTPS' -Direction Inbound `
                    -LocalPort 5986 -Protocol TCP -Action Allow

Then the inventory. Prefer Kerberos auth in a domain; for non-domain hosts, NTLM over HTTPS is acceptable. Avoid Basic and CredSSP unless you have a hard requirement and understand the credential-exposure trade-off.

[web]
web01.corp.example.com
web02.corp.example.com

[web:vars]
ansible_connection=psrp
ansible_port=5986
ansible_psrp_auth=kerberos
ansible_psrp_cert_validation=validate

Validate connectivity with win_ping before any real play - it confirms the listener, auth, and certificate chain in one shot:

ansible web -m ansible.windows.win_ping

5. Writing idempotent Ansible roles with win_ modules

The win_ modules are written to be idempotent: they check current state and report changed only when they actually mutate something. Your job is to use the declarative modules (win_feature, win_service, win_regedit) and resist dropping to raw win_shell, which Ansible cannot reason about for change detection.

Here is the IIS baseline from section 2, expressed as an Ansible role task file:

# roles/web_baseline/tasks/main.yml
- name: Ensure IIS web server role is present
  ansible.windows.win_feature:
    name: Web-Server
    state: present
    include_management_tools: true
  register: iis_feature

- name: Ensure W3SVC is running and automatic
  ansible.windows.win_service:
    name: W3SVC
    state: started
    start_mode: auto

- name: Disable SMBv1 server via registry
  ansible.windows.win_regedit:
    path: HKLM:\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters
    name: SMB1
    data: 0
    type: dword

- name: Reboot only if the feature install demands it
  ansible.windows.win_reboot:
  when: iis_feature.reboot_required

Two idempotency disciplines worth internalizing. First, guard reboots on the module’s own signal - win_feature returns reboot_required, so reboot conditionally rather than unconditionally. Second, when you are forced into win_command or win_shell, add creates/removes guards or a changed_when expression so a re-run does not falsely report changed:

- name: Run a one-shot installer only once
  ansible.windows.win_command: C:\setup\app-install.exe /quiet
  args:
    creates: C:\Program Files\App\app.exe   # skip if already installed

Run with --check to verify a role is a no-op against already-converged hosts. A correctly authored role reports ok=... with changed=0 on the second pass - that is your idempotency proof.

6. Combining DSC resources inside Ansible workflows

This is the payoff. Ansible’s win_dsc module invokes any installed DSC resource directly - no MOF compilation, no LCM scheduling. You get DSC’s mature resource implementations (especially for niche state like xWebsite bindings or security settings) driven by Ansible’s orchestration, inventory, and check mode. The module maps DSC resource properties to task parameters one-to-one.

- name: Ensure a website via the DSC Service resource through Ansible
  ansible.windows.win_dsc:
    resource_name: Service
    Name: W3SVC
    State: Running
    StartupType: Automatic

- name: Configure a registry value via DSC Registry resource
  ansible.windows.win_dsc:
    resource_name: Registry
    Key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters
    ValueName: SMB1
    ValueData: '0'
    ValueType: Dword
    Ensure: Present

win_dsc honors check mode by calling the resource’s Test method and reporting what would change, and it surfaces the resource version so you can pin it. The decision rule I use: reach for win_dsc when a high-quality DSC resource already encapsulates complex logic you would otherwise hand-roll in win_shell; use native win_ modules for the common cases because they are faster and need no DSC module installed on the target.

One caveat: win_dsc requires the resource module present on the target, and the resource must be compatible with the WMF 5.1 DSC engine. Resources that depend on the cross-platform DSC 3.0 runtime will not load here.

7. Detecting and auto-remediating configuration drift

Two layers of remediation, and you want both.

Layer one - the LCM’s own loop. With ApplyAndAutoCorrect from section 3, every Windows host with an applied MOF self-corrects on its consistency-check timer with no controller involved. This is the safety net that keeps a box compliant even if your Ansible control node is offline for a week. Audit what the LCM did:

# Did anything drift and get corrected on this node?
Get-WinEvent -LogName 'Microsoft-Windows-DSC/Operational' -MaxEvents 50 |
    Where-Object Message -match 'not in the desired state|in desired state'

Layer two - Ansible-driven detection and report. Run the converging play on a schedule (cron on the control node, or AWX/AAP). Because every task is idempotent, a scheduled run is drift remediation: anything that drifted gets pulled back, and the run’s change count tells you how much drifted.

# Detect only - check mode plus diff, never mutates
ansible-playbook site.yml --check --diff
# Remediate - drop --check; idempotent tasks correct drift in place
ansible-playbook site.yml --diff

Wire the change count into alerting. A clean nightly run should be changed=0; a non-zero count means something mutated state out-of-band between runs and is worth a ticket. Parse the play recap, or use the ansible.posix.json callback to emit machine-readable results into your observability pipeline.

8. Testing with Pester and integrating into CI

Never ship a DSC configuration or Ansible role that has not been compiled and structurally validated in CI. Pester is the PowerShell test framework; use it to assert that a configuration compiles to a MOF and that the MOF contains the resources you expect. Pester v5 syntax:

# WebBaseline.Tests.ps1
Describe 'WebBaseline configuration' {
    BeforeAll {
        . $PSScriptRoot\WebBaseline.ps1
        $out = Join-Path $TestDrive 'mof'
        WebBaseline -NodeName 'TEST01' -OutputPath $out
    }

    It 'compiles a MOF for the node' {
        Join-Path $TestDrive 'mof\TEST01.mof' | Should -Exist
    }

    It 'declares the IIS WindowsFeature resource' {
        Get-Content (Join-Path $TestDrive 'mof\TEST01.mof') -Raw |
            Should -Match 'MSFT_RoleResource'
    }
}

Run it and fail the build on any red:

$result = Invoke-Pester -Path .\tests -PassThru
if ($result.FailedCount -gt 0) { throw "Pester failed: $($result.FailedCount)" }

For the Ansible side, lint and syntax-check on every push, and lean on --check against a throwaway test host in a later stage:

ansible-lint roles/ playbooks/
ansible-playbook site.yml --syntax-check

A minimal GitHub Actions pipeline ties both together. The PowerShell job runs on a Windows runner (DSC compilation needs WMF 5.1); the lint job runs on Linux:

name: config-mgmt-ci
on: [push, pull_request]
jobs:
  dsc-pester:
    runs-on: windows-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install DSC resources
        shell: powershell
        run: Install-Module PSDscResources -Force -Scope CurrentUser
      - name: Run Pester
        shell: powershell
        run: |
          $r = Invoke-Pester -Path .\tests -PassThru
          if ($r.FailedCount -gt 0) { exit 1 }
  ansible-lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install ansible-lint
      - run: ansible-lint roles/ playbooks/

Verify

Walk this checklist on a fresh target to confirm the whole chain works end to end:

# 1. LCM is in auto-correct mode at a sane frequency
(Get-DscLocalConfigurationManager).ConfigurationMode      # ApplyAndAutoCorrect

# 2. The node is currently in desired state
Test-DscConfiguration -Detailed                           # InDesiredState : True

# 3. Force drift, then prove the LCM corrects it on next check
Stop-Service W3SVC
Start-DscConfiguration -UseExisting -Wait -Verbose         # re-applies stored MOF
(Get-Service W3SVC).Status                                 # Running
# 4. Ansible reachability and idempotency
ansible web -m ansible.windows.win_ping                    # pong
ansible-playbook site.yml                                  # first run: changed>0
ansible-playbook site.yml --check --diff                   # second run: changed=0

If step 3 leaves the service stopped, your LCM is not in ApplyAndAutoCorrect (or no MOF is stored). If step 4’s second run reports changed, a task is non-idempotent - find it with --diff and add proper guards.

Enterprise scenario

A retail platform team ran roughly 400 Windows IIS/middleware hosts across three regions, configured by a single Ansible control node in their primary datacenter. The constraint surfaced during a regional network partition: when the control node was unreachable for eleven hours, application teams pushed manual registry “hotfixes” directly on production boxes to work around an incident. Nothing pulled those changes back, because Ansible only converges when it runs - and it could not run. Two days later, a subset of hosts silently diverged from the security baseline (SMBv1 quietly re-enabled by one of the manual edits), and it went unnoticed until an audit scan flagged it.

The fix was to stop treating Ansible as the only enforcement plane and let DSC’s LCM be the always-on backstop for the security-critical subset. They compiled a small, high-value MOF (SMBv1 disabled, TLS registry settings, a handful of services) and set every host’s LCM to ApplyAndAutoCorrect at 30 minutes. Ansible still owned application deployment and the broader config surface, but the non-negotiable security state now self-healed independent of the control node. The pairing was deliberate: Ansible for orchestration and breadth, the LCM for autonomous drift correction on the settings that must never drift.

[DSCLocalConfigurationManager()]
Configuration SecurityBackstop {
    Node localhost {
        Settings {
            RefreshMode                    = 'Push'
            ConfigurationMode              = 'ApplyAndAutoCorrect'
            ConfigurationModeFrequencyMins = 30
        }
    }
}
SecurityBackstop -OutputPath .\Backstop
Set-DscLocalConfigurationManager -Path .\Backstop

After rollout, the next partition was a non-event: hosts kept correcting the security MOF every half hour regardless of control-node reachability, and the audit scan stayed green. The lesson the team wrote into their runbook: an agentless tool cannot remediate when it cannot reach the host, so anything that must stay converged needs an on-box agent behind it.

Checklist

windows-serverpowershell-dscansibleautomationidempotency

Comments

Keep Reading