Hyper-V Live Migration and Replica for Zero-Downtime VM Mobility

Two Hyper-V mobility features get conflated constantly, and the conflation costs outages. Live migration moves a running VM between hosts with no perceptible downtime - a planned, online operation for patching, load balancing, and host evacuation. Hyper-V Replica asynchronously ships a VM’s changes to a second site so you have a recoverable copy when the primary is gone - disaster recovery, with an RPO of seconds to minutes, not zero. You want both, configured so the first time you lean on them is not the day you learn they were never set up correctly.

This is the build I run, in PowerShell so it is repeatable and reviewable. The reference is two standalone Hyper-V hosts in the primary site, hv01 (10.10.20.11) and hv02 (10.10.20.12), each with a dedicated 25 GbE migration NIC on the 10.10.99.0/24 subnet, plus a DR host hv-dr01 (10.20.20.11) in a second site across a routed WAN. All are domain-joined to contoso.local. Commands target Windows Server 2022; the same cmdlets work on 2019 and 2025.

Scope note: clustered live migration (CSV-backed, inside a failover cluster) is a different animal with its own networking model. This article covers the standalone and shared-nothing paths plus Replica, because that is where the manual configuration - and the mistakes - actually live.

1. The three live migration types, and when each applies

Hyper-V live migration comes in three flavors, distinguished by where the VM’s storage sits:

Type	Storage situation	What moves	Typical use
Shared storage	VHDX on shared SMB 3 / CSV both hosts see	Only memory + device state	Fast host evacuation in a cluster or SOFS estate
Shared-nothing	VHDX on the source host’s local disk	Memory and storage, over the network	Moving between standalone hosts with no shared storage
Storage migration	VM stays on the same host	Only the VHDX files, to a new path/volume	Rebalancing LUNs, moving off a failing disk, live with the VM running

Shared-nothing live migration is the headline capability: it relocates a running VM between two hosts that share nothing - no SAN, no cluster, no common storage - copying disk and live memory in one operation. It is slower than memory-only migration because it physically moves the VHDX, but it needs zero shared infrastructure, which is why it is the workhorse for standalone hosts.

All three are online. The guest keeps running; a sub-second final blackout switches execution to the destination once memory converges.

2. Authentication: CredSSP vs. Kerberos constrained delegation

This is the decision that trips everyone up. Live migration needs the source host to authenticate to the destination on your behalf, and there are two ways:

CredSSP - the initiating user’s credentials are delegated to the destination. The catch: you must be logged on interactively to the source host to trigger the move. You cannot start a CredSSP live migration remotely (from your workstation via PowerShell remoting), because that is a double hop and CredSSP only delegates one. It needs no AD config, its only virtue.
Kerberos constrained delegation - the computer accounts are configured in AD so the source host is trusted to delegate specific services to the destination. This lets you initiate migrations remotely, from Hyper-V Manager, SCVMM, or a remote session. It is the right choice for any managed estate.

Use Kerberos. Set the authentication type on every host:

$hosts = 'hv01','hv02','hv-dr01'
Invoke-Command -ComputerName $hosts -ScriptBlock {
    Enable-VMMigration
    Set-VMHost -VirtualMachineMigrationAuthenticationType Kerberos
}

Now configure constrained delegation in AD. Live migration requires delegating two services from each source host to each destination it might migrate to: Microsoft Virtual System Migration Service (the migration itself) and cifs (so the destination can pull files over SMB). Delegation is directional, so bidirectional migration between hv01 and hv02 is configured both ways, via the ActiveDirectory module:

Import-Module ActiveDirectory

# Allow hv01 to delegate to hv02 (and vice versa) for the two required services.
# Delegation is configured on the SOURCE; targets are the DESTINATION's SPNs.

function Add-MigrationDelegation {
    param($SourceHost, $DestHost)
    $dest = Get-ADComputer $DestHost
    $spns = @(
        "Microsoft Virtual System Migration Service/$($dest.DNSHostName)",
        "Microsoft Virtual System Migration Service/$DestHost",
        "cifs/$($dest.DNSHostName)",
        "cifs/$DestHost"
    )
    Set-ADComputer $SourceHost -Add @{
        'msDS-AllowedToDelegateTo' = $spns
    }
    # Constrained delegation requires the source account flagged accordingly:
    Get-ADComputer $SourceHost |
        Set-ADAccountControl -TrustedToAuthForDelegation $true
}

Add-MigrationDelegation -SourceHost 'hv01' -DestHost 'hv02'
Add-MigrationDelegation -SourceHost 'hv02' -DestHost 'hv01'

Kerberos caches delegation data. After editing msDS-AllowedToDelegateTo, the change is not effective on a host until its Kerberos ticket cache reflects the new AD state - in practice, reboot the affected hosts (or wait for ticket renewal) before testing. Skipping this is the number-one reason “I configured delegation and it still fails.”

3. Migration networks, SMB Direct, and performance options

By default Hyper-V uses any available network for live migration, which can saturate your management or production NIC. Pin migration traffic to the dedicated subnet and order it explicitly:

Invoke-Command -ComputerName $hosts -ScriptBlock {
    # Stop using every network; opt in to specific subnets only.
    Set-VMHost -UseAnyNetworkForMigration $false

    # Remove the catch-all and add the dedicated migration subnet.
    Get-VMMigrationNetwork | Remove-VMMigrationNetwork -ErrorAction SilentlyContinue
    Add-VMMigrationNetwork -Subnet 10.10.99.0/24 -Priority 10
}

Then choose the performance option, which governs the transport for the memory copy:

Invoke-Command -ComputerName $hosts -ScriptBlock {
    # SMB  -> uses SMB 3, enabling SMB Direct (RDMA) and SMB Multichannel
    # Compression -> compresses memory pages on the CPU before sending (default)
    # TCPIP -> raw TCP, no compression
    Set-VMHost -VirtualMachineMigrationPerformanceOption SMB
}

The trade-off is real:

Compression (the default) spends CPU to shrink the memory stream. Best on a constrained network (1 GbE) with spare CPU.
SMB rides SMB 3, which transparently uses SMB Direct if your NICs are RDMA-capable (RoCE/iWARP) and SMB Multichannel to aggregate links. On 25 GbE RDMA this is far faster than compression and offloads the CPU entirely - the right answer for the reference hardware.
TCPIP is the fallback when you want neither.

Live migration over SMB uses TCP port 6600 on the destination in addition to the SMB ports; permit it on the host firewall and any inter-host ACLs on the migration subnet. Confirm SMB Multichannel is actually engaging once a migration runs:

Get-SmbMultichannelConnection -ServerName hv02 |
    Format-Table ClientIp, ServerIp, ClientRdmaCapable, ServerRdmaCapable

If *RdmaCapable is False on a NIC you expected to be RDMA, you are silently falling back to plain SMB - check Get-NetAdapterRdma and the switch’s DCB/PFC config.

4. Executing and throttling migrations

A shared-nothing live migration that moves both compute and storage in one shot:

# Run remotely (Kerberos delegation makes this possible).
Move-VM -Name 'app-web01' `
        -ComputerName hv01 `
        -DestinationHost hv02 `
        -IncludeStorage `
        -DestinationStoragePath 'D:\VMs\app-web01'

A pure storage migration - move the VHDX to a new volume while the VM keeps running on the same host:

Move-VMStorage -VMName 'app-db01' `
               -DestinationStoragePath 'E:\VMs\app-db01' `
               -ComputerName hv01

A memory-only live migration when the VM is already on shared SMB storage both hosts see:

Move-VM -Name 'app-cache01' -ComputerName hv01 -DestinationHost hv02

Throttling matters. Each host caps simultaneous migrations to protect the network and disk. Defaults are conservative (2 each); tune to your fabric:

Invoke-Command -ComputerName $hosts -ScriptBlock {
    Set-VMHost -MaximumVirtualMachineMigrations 4 `  # concurrent live migrations
               -MaximumStorageMigrations 2           # concurrent storage moves
}

On 25 GbE RDMA push the live-migration count higher; on 1 GbE, leave it low or a host evacuation will thrash. Storage migrations are disk-bound - do not over-parallelize them against the same volume.

5. Enabling Hyper-V Replica

Replica is independent of live migration and uses its own listener. Enable the replica server role on the DR host (hv-dr01) - the side that receives replicas. Decide on the transport first:

Kerberos (HTTP, port 80) - domain-joined hosts in the same forest, traffic on a trusted network or VPN. Simple, no certificates, but unencrypted on the wire - only acceptable over an already-encrypted/private link.
Certificate (HTTPS, port 443) - mutual TLS with a certificate on each host. Mandatory across untrusted domains or the public internet. More setup, encrypted end to end.

Kerberos/HTTP, suited to the cross-site-but-private-WAN reference:

Invoke-Command -ComputerName hv-dr01 -ScriptBlock {
    Set-VMReplicationServer `
        -ReplicationEnabled $true `
        -AllowedAuthenticationType Kerberos `
        -KerberosAuthenticationPort 80 `
        -DefaultStorageLocation 'R:\Replicas'

    # Authorize which primary servers may send, and where their replicas land.
    New-VMReplicationAuthorizationEntry `
        -AllowedPrimaryServer '*.contoso.local' `
        -ReplicaStorageLocation 'R:\Replicas' `
        -TrustGroup 'PrimarySite'
}

Open the listener on the DR host’s firewall (the rule is built in, just disabled):

Invoke-Command -ComputerName hv-dr01 -ScriptBlock {
    Enable-NetFirewallRule -DisplayName 'Hyper-V Replica HTTP Listener (TCP-In)'
    # For certificate-based replication instead:
    # Enable-NetFirewallRule -DisplayName 'Hyper-V Replica HTTPS Listener (TCP-In)'
}

For certificate auth, set -AllowedAuthenticationType Certificate -CertificateAuthenticationPort 443 -CertificateThumbprint <thumbprint> on Set-VMReplicationServer, and enable replication on each VM with -AuthenticationType Certificate -CertificateThumbprint <primary-side-thumbprint>. The certificate’s Enhanced Key Usage must include both Server Authentication and Client Authentication, because each host acts as both.

6. Tuning frequency, recovery points, and resync windows

Now enable replication for a VM, pointing the primary host at the replica server. These parameters define your RPO and recovery granularity:

Enable-VMReplication -VMName 'app-web01' `
    -ComputerName hv01 `
    -ReplicaServerName 'hv-dr01.contoso.local' `
    -ReplicaServerPort 80 `
    -AuthenticationType Kerberos `
    -ReplicationFrequencySec 300 `
    -RecoveryHistory 12 `
    -VSSSnapshotFrequencyHour 4

# Kick off the first full copy (over the network; use -InitialReplicationStartTime to defer to off-hours).
Start-VMInitialReplication -VMName 'app-web01' -ComputerName hv01

What each knob does:

-ReplicationFrequencySec accepts exactly 30, 300, or 900 seconds - the cadence the delta log ships, effectively your RPO floor. 30s for tier-1 databases, 300s for general workloads, 900s for bulk/archival VMs to spare WAN bandwidth.
-RecoveryHistory (0-24) keeps extra crash-consistent recovery points so you can fail over to a point before a corruption or ransomware event, not just the latest. 0 keeps only the most recent. Each point costs storage and replication overhead on the DR side.
-VSSSnapshotFrequencyHour layers application-consistent (VSS) snapshots on top, so VSS-aware guests (SQL, Exchange) recover to a transactionally clean state. Only meaningful when RecoveryHistory > 0.

For initial replication of large VMs, copying the full VHDX over a thin WAN is brutal. Seed instead from a backup restore already on the DR host, or replicate over the LAN first and physically ship the DR host. If replication ever breaks badly enough to need a full resync, schedule it - resync re-hashes the entire disk and is bandwidth-heavy:

# Constrain when an out-of-sync VM is allowed to resynchronize (off-peak window).
Set-VMReplication -VMName 'app-web01' -ComputerName hv01 `
    -AutoResynchronizeEnabled $true `
    -AutoResynchronizeIntervalStart '22:00:00' `
    -AutoResynchronizeIntervalEnd '06:00:00'

7. Planned, unplanned, and test failover

These three procedures are not interchangeable, and using the wrong one loses data.

Test failover - non-disruptive. Spins up an isolated copy of the replica on the DR host, disconnected from the network, so you can verify the VM boots and the app works while production replication keeps running. Always test before you trust.

# On the DR host (replica side):
Start-VMFailover -VMName 'app-web01' -ComputerName hv-dr01 -AsTest
# ...verify the test VM boots and the app is intact, then clean it up:
Stop-VMFailover -VMName 'app-web01' -ComputerName hv-dr01 -AsTest

Planned failover - zero data loss, used when the primary is healthy and reachable (site maintenance, a controlled migration). It flushes the final delta from primary to replica before cutover, then reverses direction. Run the prepare step on the primary:

# 1. On the PRIMARY: shut the VM down and ship the last changes.
Stop-VM -Name 'app-web01' -ComputerName hv01
Start-VMFailover -VMName 'app-web01' -ComputerName hv01 -Prepare

# 2. On the REPLICA (DR): bring it online as the live copy.
Start-VMFailover -VMName 'app-web01' -ComputerName hv-dr01
Start-VM -Name 'app-web01' -ComputerName hv-dr01

# 3. On the REPLICA: reverse replication so DR is now primary, hv01 is replica.
Set-VMReplication -VMName 'app-web01' -ComputerName hv-dr01 -Reverse

Unplanned failover - the primary site is gone. No final flush; you accept the loss of whatever had not replicated (up to one ReplicationFrequencySec interval). Run it on the replica, optionally choosing an older recovery point:

# On the REPLICA (DR), primary is unreachable:
Start-VMFailover -VMName 'app-web01' -ComputerName hv-dr01
Start-VM -Name 'app-web01' -ComputerName hv-dr01

To recover to an earlier point (e.g. to escape corruption), pass -VMRecoverySnapshot with a snapshot from Get-VMSnapshot. Once the chosen point checks out, commit it with Complete-VMFailover to discard the others.

8. Failback: reversing replication after the primary returns

After an unplanned failover, the DR copy is running but not replicating anywhere. When the primary comes back, reverse the relationship, let it resync DR-to-primary, then plan-failover back during a maintenance window:

# 1. With the VM running on DR, reverse replication so primary becomes the replica.
Set-VMReplication -VMName 'app-web01' -ComputerName hv-dr01 -Reverse
Start-VMInitialReplication -VMName 'app-web01' -ComputerName hv-dr01

# 2. Once health is 'Normal', do a PLANNED failover back to hv01 (Step 7),
#    then -Reverse again so hv01 -> hv-dr01 is restored as the steady state.

Failback is just a planned failover in the opposite direction. Never skip the resync-and-verify step - cutting back to a primary whose disk has drifted turns a recovered incident into a fresh one.

Verify

Live migration, dry-run the network and confirm a real move:

# Performance + network config is what you set:
Get-VMHost -ComputerName hv01 |
    Format-List VirtualMachineMigrationEnabled, `
        VirtualMachineMigrationAuthenticationType, `
        VirtualMachineMigrationPerformanceOption, `
        MaximumVirtualMachineMigrations
Get-VMMigrationNetwork -ComputerName hv01

# Move a low-risk VM hv01 -> hv02 and confirm it stayed up (ping in another window).
Move-VM -Name 'test-canary' -ComputerName hv01 -DestinationHost hv02 -IncludeStorage `
        -DestinationStoragePath 'D:\VMs\test-canary'

Replica health is reported per VM by Measure-VMReplication - this is your dashboard:

Measure-VMReplication -ComputerName hv01 |
    Format-Table Name, State, Health, LReplTime, LReplSize, AvgReplSize, PrimaryServerName

Health should read Normal. Warning or Critical means missed cycles - check WAN, the listener, and the auth entry. Confirm the listener is live on the DR host:

Get-VMReplicationServer -ComputerName hv-dr01 |
    Format-List ReplicationEnabled, AllowedAuthenticationType, KerberosAuthenticationPort
Get-NetTCPConnection -LocalPort 80 -State Listen -ErrorAction SilentlyContinue

And run a quarterly test failover (Step 7) - Normal health proves data is arriving, not that the VM boots.

Enterprise scenario

A managed-services team ran ~40 standalone Hyper-V hosts across two metro data centres joined by a 1 GbE WAN they did not own and could not upgrade. Replica sat at the default 5-minute frequency for everything, including a chatty 2 TB SQL VM. During a planned DC power test they live-migrated workloads off the affected hosts while Replica kept running - migration traffic and replica deltas collided on the shared uplink and blew past its capacity. Live migrations stalled mid-flight and several replicas went Critical, triggering automatic resyncs that re-hashed entire disks and made the congestion catastrophically worse.

The constraint was a single, un-ownable, saturated WAN carrying two competing flows with no isolation. The fix had three parts. First, they pinned live migration to a dedicated subnet and moved the heavy VMs to SMB performance mode to cut CPU pressure during evacuations. Second, they tiered replication frequency - 30s only for tier-1 databases, 900s for bulk - and bounded resync to an overnight window so a Critical event could not re-hash a disk in business hours:

# Tiered replication + bounded resync window on the WAN-bound estate.
Set-VMReplication -VMName 'sql-tier1'  -ComputerName hv11 `
    -ReplicationFrequencySec 30
Set-VMReplication -VMName 'file-bulk'  -ComputerName hv11 `
    -ReplicationFrequencySec 900 `
    -AutoResynchronizeEnabled $true `
    -AutoResynchronizeIntervalStart '23:00:00' `
    -AutoResynchronizeIntervalEnd '05:00:00'

Third - the part most teams miss - they seeded the 2 TB SQL replica from a backup restore already on the DR host instead of crawling the full disk over 1 GbE. The next DC power test was a non-event. The lesson was not “Replica is fragile” - it was that frequency and resync are bandwidth decisions, and a shared WAN forces you to schedule them like the scarce resource they are.

Hyper-V Live Migration and Replica for Zero-Downtime VM Mobility

1. The three live migration types, and when each applies

2. Authentication: CredSSP vs. Kerberos constrained delegation

3. Migration networks, SMB Direct, and performance options

4. Executing and throttling migrations

5. Enabling Hyper-V Replica

6. Tuning frequency, recovery points, and resync windows

7. Planned, unplanned, and test failover

8. Failback: reversing replication after the primary returns

Verify

Enterprise scenario

Checklist

Written by Vinod

Comments

Keep Reading

Building a Two-Tier AD CS PKI: Offline Root and Enterprise Issuing CA

Diagnosing AD Replication and FSMO Failures with repadmin and dcdiag

Authoring AppArmor Profiles: Confining Services on Ubuntu and Debian