Azure Hybrid

Onboarding Servers to Azure Arc: Connected Machine Agent, Service Principals & Bulk Enrollment

You have a rack of Windows and Linux servers in your own data centre, plus a handful of VMs over in AWS and GCP, and your security team has just asked the question every hybrid shop eventually hears: “Can you show me these machines in Azure, apply our tag policy, and tell me which ones are missing patches?” The machines aren’t in Azure and you won’t migrate them — but they still need to sit under the same governance, monitoring and identity plane as everything in the cloud. That is the gap Azure Arc closes: it projects a server that lives anywhere into Azure as a first-class resource you can see, tag, govern with Azure Policy, monitor, and grant Azure RBAC over, all without moving a byte of the workload.

The mechanism is a lightweight piece of software, the Azure Connected Machine agent (the azcmagent binary). You install it on each server; it registers the machine as an Azure Resource Manager resource of type Microsoft.HybridCompute/machines, and from that moment the server shows up next to your native Azure VMs — same resource group, tags, policy assignments and activity log. Doing one server by hand takes five minutes; doing three hundred by hand is a non-starter, which is why this guide treats bulk enrolment with a service principal as the real goal — the interactive install teaches you the moving parts, then you automate it.

By the end you will have onboarded a server two ways (portal and az CLI), created a least-privilege onboarding service principal, generated and run the at-scale script unattended, validated the connection from both sides, and torn it all down — plus you’ll know the handful of things that actually go wrong (firewall egress, the wrong RBAC role, a clock skew breaking TLS) and how to confirm and fix each in under a minute.

What problem this solves

Without Arc, a server outside Azure is invisible to Azure. It has no resource ID, so you cannot put it in a resource group, cannot tag it for cost allocation, cannot target it with an Azure Policy assignment, cannot grant a colleague time-boxed RBAC access to it, and cannot pull its guest inventory or patch status into Azure Monitor. Your “single pane of glass” has a hole in it the exact shape of everything that isn’t in the cloud — which, for most enterprises, is still the majority of the estate.

The painful workarounds are familiar — a separate on-prem monitoring stack nobody reconciles, a stale “servers we also have” spreadsheet, patch compliance proven by audit screenshots — each a governance gap waiting to become an audit finding. Onboarding fixes all of it at once. Who hits it: anyone running hybrid, multicloud (Azure + AWS/GCP), or edge estates told to bring everything under one governance model — hardest during a compliance push, when every server must carry the right tags, report patch status, and sit behind the same Microsoft Defender for Cloud posture.

Learning objectives

By the end of this article you can:

Prerequisites & where this fits

You should be comfortable with the Azure basics — what a subscription and a resource group are (see Azure Resource Hierarchy Explained: Subscriptions, Resource Groups and Resources), running az in Cloud Shell, and logging in to a server as local administrator/root. You’ll want a server you can install software on: a spare VM, a local Hyper-V/VirtualBox guest, or even a cloud VM in another provider — the whole point of Arc is that the machine need not be in Azure.

On the identity side this builds on the service principal model: at-scale onboarding authenticates as an app, not a human. If app registration, service principal and client secret are fuzzy, read App Registrations vs Enterprise Applications: The Service Principal Model Explained first; storing that secret correctly leads into Azure Key Vault: Secrets, Keys and Certificates Done Right.

Onboarding is the front door of the whole Arc story. Once a server is Arc-enabled, the downstream work — governing it with Azure Policy Effects Decoded: Deny vs Audit vs Modify vs DeployIfNotExists, monitoring it via Azure Monitor and Application Insights: Full-Stack Observability, and scoping access with Management Groups 101: Designing a Hierarchy That Scopes Policy and RBAC — is identical to a native Azure VM. This article gets the machine into Azure; those tell you what to do once it’s there.

Core concepts

A few mental models make every later step obvious.

Arc projects a resource, not the workload. Onboarding does not move, copy, or change your server. It creates a small ARM resource — Microsoft.HybridCompute/machines/<name> — in a resource group you choose, and links it to the physical/virtual machine via the agent. The workload keeps running exactly where it is; Azure simply gains a handle to it. Delete the Arc resource and the server is untouched (it just disappears from Azure).

The Connected Machine agent is the bridge. The agent (azcmagent plus background services) does three jobs: it registers the machine (creating that ARM resource), maintains a managed identity for the server (to authenticate to Azure services like Key Vault), and is the landing pad for extensions (Log Analytics agent, custom scripts, Defender). It is deliberately lightweight — not a remote-desktop or remote-shell tool, and it gives Microsoft no shell on your box.

Onboarding needs an identity that can create the resource. Someone — a human signed in with az login, or a service principal running a script — must have permission to create the HybridCompute/machines resource. Interactive onboarding uses your login; at-scale onboarding uses a dedicated service principal so no human credentials sit in a deployment script. The right role for that identity is the narrow Azure Connected Machine Onboarding role — it can create and read Arc machines and nothing else.

The agent reaches Azure outbound over HTTPS (443). The server initiates all communication outbound to a defined set of Azure endpoints on TCP 443. Azure never connects inbound to your server, so you do not open any inbound firewall ports. If onboarding fails, it is almost always because outbound 443 to one of those endpoints is blocked — that is the single most common failure, and the agent has a built-in azcmagent check to test it.

The machine then carries an Azure identity and a region. Once connected, the Arc machine lives in a specific Azure region and resource group, and its system-assigned identity lets a local process read a token from http://localhost:40342 to call Azure RBAC-protected services — exactly like a managed identity on a native VM.

The vocabulary in one table

Term One-line definition Why it matters to onboarding
Azure Arc Extends ARM control plane to non-Azure machines The umbrella feature you’re enabling
Connected Machine agent The azcmagent software you install The thing that registers and maintains the link
azcmagent The agent’s CLI (connect, show, check, disconnect) Every server-side action runs through it
Microsoft.HybridCompute/machines The ARM resource type Arc creates The “handle” Azure gets to your server
Resource provider The ARM service that owns a resource type Must be registered on the subscription first
Onboarding service principal An app identity used to enrol at scale Replaces a human login in automation
Azure Connected Machine Onboarding Built-in RBAC role to create Arc machines Least-privilege role for the SPN
Azure Connected Machine Resource Administrator RBAC role to manage Arc machines + extensions For day-2 management, not onboarding
System-assigned managed identity Per-machine identity Arc grants the server Lets the server call Azure services securely
Extension Software Azure pushes onto an Arc machine Monitoring, Defender, scripts — all post-onboard

What you need before the first install

Onboarding has a small, fixed set of prerequisites. Get these right and the install is uneventful; miss one and you hit a clear, specific error. Here is the full checklist.

Prerequisite Detail / value How to satisfy it
Supported OS Windows Server 2012 R2+; common Linux (Ubuntu, RHEL, SUSE, Debian, etc.) Check the server OS version
Local privilege on the server Administrator (Windows) or root/sudo (Linux) Log in with that account
Two resource providers registered Microsoft.HybridCompute, Microsoft.GuestConfiguration (also Microsoft.HybridConnectivity, Microsoft.AzureArcData for some features) az provider register (shown below)
Azure RBAC to onboard Azure Connected Machine Onboarding (create) on the target RG Role assignment (human or SPN)
Outbound HTTPS (443) To the Arc service endpoints (and optionally via proxy) Firewall rule / proxy config
A target resource group + region Where the Arc resource will live az group create
Correct system clock TLS fails if skew is large NTP/time sync on the server

The two resource providers must be registered on the subscription before the first onboarding, or registration fails with a provider-not-registered error. Do it once per subscription:

# Register the providers Arc needs (idempotent; takes a minute to propagate)
az provider register --namespace Microsoft.HybridCompute --wait
az provider register --namespace Microsoft.GuestConfiguration --wait
az provider register --namespace Microsoft.HybridConnectivity --wait
az provider register --namespace Microsoft.AzureArcData --wait

# Confirm they show Registered
az provider show -n Microsoft.HybridCompute --query registrationState -o tsv
az provider show -n Microsoft.GuestConfiguration --query registrationState -o tsv

Expected output: each show prints Registered. If it prints Registering, wait a minute and re-check.

The two roles that matter (and the difference)

There are two Arc-specific built-in roles, and conflating them is a common early mistake. One is for getting machines in; the other is for managing them afterwards. Use the narrowest one for each job.

Role Can do Use it for Don’t use it for
Azure Connected Machine Onboarding Create + read Arc machines The onboarding SPN / first enrolment Day-2 management, running extensions
Azure Connected Machine Resource Administrator Read, manage, delete Arc machines + extensions Operators managing the fleet The onboarding SPN (too broad)
Reader (built-in) View the Arc machine Auditors, viewers Anything that changes state

The onboarding service principal should hold only Azure Connected Machine Onboarding, scoped to the single resource group it enrols into — that role cannot delete machines, cannot push extensions, and cannot read your other resources, so a leaked onboarding secret has a tightly bounded blast radius. See Managed Identities Demystified: System vs User-Assigned and When to Use Each for how this contrasts with the machine’s own identity after onboarding.

Network egress: the endpoints and ports

The agent talks outbound only, over TCP 443, to a defined set of endpoints. You open no inbound ports. The endpoint set below is what onboarding and steady-state operation require; if your firewall is restrictive, allow-list these (or route via the agent’s proxy support).

Endpoint (outbound) Port Purpose Required?
login.microsoftonline.com 443 Entra ID auth (get a token) Yes
management.azure.com 443 ARM — create/update the machine resource Yes
*.his.arc.azure.com 443 Hybrid Identity Service (machine identity) Yes
*.guestconfiguration.azure.com 443 Guest configuration / policy Yes (for policy)
pas.windows.net 443 Azure AD device registration (Windows) Yes (Windows)
download.microsoft.com / packages.microsoft.com 443 Agent + extension download During install
*.servicebus.windows.net 443 Optional SSH/remote features Only if used

If you can’t open broad egress, the agent supports an HTTP/HTTPS proxy (azcmagent config set proxy.url) and Azure offers a private-endpoint model (Arc private link scope) so traffic stays on a private network — but for a first onboarding, allow-listing the endpoints above over 443 is the simplest path.

Onboarding a single server (the interactive path)

There are two interactive routes to the same result: the portal, which generates a ready-to-run script, and the az/azcmagent CLI, which you run directly. Both end with the same HybridCompute/machines resource. Learn the CLI path because it’s the basis of the at-scale script; the portal path is the friendliest way to see the moving parts the first time.

Route A — the Azure portal (generate-and-run)

The portal never installs anything by itself (your server isn’t reachable from Azure); instead it builds a script you copy to the server and run there.

# In the portal What it does
1 Search Azure ArcMachines+ Add/ConnectAdd a single server Starts the onboarding wizard
2 Pick Subscription, Resource group, Region, and Operating system (Windows/Linux) Sets where the Arc resource will live
3 Choose connectivity method (Public endpoint / Proxy / Private endpoint) Matches your network egress
4 (Optional) add tags Stamps the resource on creation
5 Click Download and run script — copy the generated script to the server The script installs the agent + runs azcmagent connect with an interactive login
6 On the server (as admin/root), run the script; complete the device-code login in a browser Registers the machine; it appears under Arc → Machines

The generated script does exactly what the CLI path does below — it downloads the agent, installs it, then calls azcmagent connect with an interactive (device-code) authentication, so a human approves the enrolment in a browser. That’s perfect for one or two machines and terrible for three hundred (you can’t device-code-login on every box) — which is the whole reason the service-principal path exists.

Route B — the CLI (az connectedmachine connect)

This is the same flow, command-line driven. Run it on the target server (the agent must be installed locally). First install the agent, then connect.

Step 1 — Install the agent on the server. On Linux, Microsoft provides a one-line installer script:

# Linux: download and run the agent installer (run as root/sudo)
wget https://gbl.his.arc.azure.com/azcmagent-linux -O install_linux_azcmagent.sh
bash install_linux_azcmagent.sh

On Windows, download and install the MSI (PowerShell, as Administrator):

# Windows: download the agent MSI and install it silently
Invoke-WebRequest -Uri "https://aka.ms/AzureConnectedMachineAgent" -OutFile AzureConnectedMachineAgent.msi
msiexec /i AzureConnectedMachineAgent.msi /qn

Expected: the installer reports success and the azcmagent binary is now on PATH. Confirm with azcmagent version.

Step 2 — Connect the machine (interactive login). Now register it against Azure. This uses your interactive login, so you’ll complete a device-code prompt:

# Connect this server to Azure Arc (interactive auth)
azcmagent connect \
  --resource-group "rg-arc-lab" \
  --location "centralindia" \
  --subscription-id "<your-subscription-id>" \
  --tags "env=lab,owner=vinod"

Expected output ends with Successfully onboarded resource to Azure. Confirm from both sides the same way the lab does below: azcmagent show on the server should read Agent Status : Connected (with the resource name, group, region and machine-identity object ID), and az connectedmachine show --name "$(hostname)" -g rg-arc-lab from Azure should return status: Connected. The machine then appears in the portal under Azure Arc → Machines, carrying your tags. If the server reads Disconnected, jump to troubleshooting — it’s almost always egress.

The azcmagent connect flags you’ll actually use:

Flag Purpose Example
--resource-group Target RG for the Arc resource rg-arc-lab
--location Azure region for the resource metadata centralindia
--subscription-id Target subscription a GUID
--tags Tags stamped on creation env=prod,app=erp
--service-principal-id App (client) ID for non-interactive auth a GUID
--service-principal-secret The SPN secret (at-scale) a secret string
--tenant-id Entra tenant (with SPN auth) a GUID
--cloud Azure cloud (Public, USGov, China) AzureCloud
--proxy-url Route through an HTTP proxy http://proxy:8080

Bulk enrolment with a service principal (the real goal)

Interactive onboarding doesn’t scale — you cannot device-code-login on hundreds of servers. The at-scale pattern replaces your login with a dedicated service principal that holds only the onboarding role, then runs azcmagent connect non-interactively with that SPN’s credentials. You build the script once and run it from your configuration-management tool (Ansible, Group Policy, a startup script, an imaging template, etc.).

Step 1 — Create the onboarding service principal

Create an app and assign it only the Azure Connected Machine Onboarding role, scoped to the resource group you’ll enrol into. The single command both creates the SPN and assigns the role at that scope:

RG=rg-arc-lab
SUB=$(az account show --query id -o tsv)

az ad sp create-for-rbac \
  --name "sp-arc-onboarding" \
  --role "Azure Connected Machine Onboarding" \
  --scopes "/subscriptions/$SUB/resourceGroups/$RG"

Expected output (capture these — the password is shown once):

{
  "appId": "11111111-1111-1111-1111-111111111111",
  "displayName": "sp-arc-onboarding",
  "password": "<the-client-secret-shown-once>",
  "tenant": "22222222-2222-2222-2222-222222222222"
}

Store appId, password and tenant immediately in Azure Key Vault (never in the script, never in source control) — see Azure Key Vault: Secrets, Keys and Certificates Done Right:

az keyvault secret set --vault-name kv-arc-lab --name arc-sp-appid    --value "<appId>"
az keyvault secret set --vault-name kv-arc-lab --name arc-sp-secret   --value "<password>"
az keyvault secret set --vault-name kv-arc-lab --name arc-sp-tenant   --value "<tenant>"

Decide the SPN’s scope deliberately, because broader scope means a leaked secret can enrol into more places: a single resource group (most labs and small fleets) bounds the blast radius to that RG; a subscription lets one SPN enrol any RG in it; a management group reaches every subscription beneath it — convenient for a large estate but wide, so use it sparingly. Pin the secret expiry too — az ad sp create-for-rbac issues a credential with a default lifetime; rotate it before it expires or onboarding silently starts failing with an auth error.

Step 2 — Generate the at-scale install script

The portal can generate this for you: Azure Arc → Machines → + Add → Add servers at scale → Create a new service principal (or use existing) → Download script. It produces a script that installs the agent and connects using the SPN. The essential shape of that script (the part that matters) for Linux:

#!/bin/bash
# At-scale onboarding (Linux) — runs unattended with a service principal
export SUBSCRIPTION_ID="<your-subscription-id>"
export RESOURCE_GROUP="rg-arc-lab"
export TENANT_ID="<tenant-id>"
export LOCATION="centralindia"
export APP_ID="<sp-appId>"
export APP_SECRET="<sp-secret>"   # injected from Key Vault / secret store, not hard-coded

# 1) Install the Connected Machine agent
wget https://gbl.his.arc.azure.com/azcmagent-linux -O /tmp/install_linux_azcmagent.sh
bash /tmp/install_linux_azcmagent.sh

# 2) Connect non-interactively using the service principal
azcmagent connect \
  --service-principal-id "$APP_ID" \
  --service-principal-secret "$APP_SECRET" \
  --resource-group "$RESOURCE_GROUP" \
  --tenant-id "$TENANT_ID" \
  --location "$LOCATION" \
  --subscription-id "$SUBSCRIPTION_ID" \
  --tags "env=prod,onboarded-by=arc-script"

The Windows version is identical in shape — install the MSI as in Route B, then call azcmagent.exe connect with the same --service-principal-id/--service-principal-secret/--tenant-id flags:

# Windows at-scale: same SPN flags (install the MSI first, as in Route B)
& "$env:ProgramFiles\AzureConnectedMachineAgent\azcmagent.exe" connect `
  --service-principal-id "$env:APP_ID" --service-principal-secret "$env:APP_SECRET" `
  --tenant-id "$env:TENANT_ID" --resource-group "$env:RESOURCE_GROUP" `
  --location "$env:LOCATION" --subscription-id "$env:SUBSCRIPTION_ID" --tags "env=prod"

The key difference from the interactive path is that --service-principal-id / --service-principal-secret / --tenant-id trio: no human, no device-code prompt, fully unattended. Inject APP_SECRET from your secret store at run time — never bake it into the checked-in script.

Step 3 — Distribute and run across the fleet

You don’t run this by hand on each box; you push it through whatever already manages your servers — an Ansible/Chef/Puppet play for Linux and mixed fleets, Group Policy or Intune for Windows domains, a cloud-init / startup script for AWS/GCP VMs, or baked into a golden VM image so new servers self-enrol at build time. A few hard rules for the at-scale run:

Step 4 — Validate the fleet from Azure

After a wave runs, confirm in bulk from az:

# List every Arc machine in the RG with its connection status and last seen time
az connectedmachine list --resource-group rg-arc-lab \
  --query "[].{name:name, status:status, os:osName, agent:agentVersion}" -o table

# Count how many are actually Connected
az connectedmachine list --resource-group rg-arc-lab \
  --query "length([?status=='Connected'])" -o tsv

Expected: one row per onboarded server, the connected count matching the number you enrolled. Any Disconnected rows are your follow-up list — head to troubleshooting for each.

The Bicep version: Arc machine and access as code

A subtlety drives what’s worth coding: the HybridCompute/machines resource is created by the agent at connect time, not by a template — so the most useful Bicep is the governance and access around Arc (the role assignment for your onboarding SPN, and tags on the machine once it exists), not the machine itself. If you’re new to Bicep, start with Deploy Your First Bicep File From Scratch: Author, Validate and Ship in 20 Minutes.

The role assignment that grants the onboarding SPN its least-privilege role on the resource group:

// Grant the onboarding service principal the Azure Connected Machine Onboarding role on this RG
// Run at resource-group scope: az deployment group create ...
param onboardingSpObjectId string   // the SPN's object (principal) ID
@description('Azure Connected Machine Onboarding role definition ID')
var onboardingRoleId = 'b64e21ea-ac4e-4cdf-9dc9-5b892992bee7'

resource onboardingRole 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(resourceGroup().id, onboardingSpObjectId, onboardingRoleId)
  properties: {
    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', onboardingRoleId)
    principalId: onboardingSpObjectId
    principalType: 'ServicePrincipal'
  }
}

Once a machine is onboarded, you can manage its tags declaratively by referencing the existing resource (Bicep doesn’t create it — the agent did — so use existing):

// Reference an already-onboarded Arc machine and standardise its tags
resource arcMachine 'Microsoft.HybridCompute/machines@2024-07-10' existing = {
  name: 'web-onprem-01'
}

// Apply tags via a tags resource (works on the existing Arc machine)
resource arcTags 'Microsoft.Resources/tags@2024-03-01' = {
  scope: arcMachine
  name: 'default'
  properties: {
    tags: {
      env: 'prod'
      app: 'erp'
      owner: 'vinod'
      'onboarded-by': 'arc-script'
    }
  }
}

Deploy it at resource-group scope:

az deployment group create \
  --resource-group rg-arc-lab \
  --template-file arc-governance.bicep \
  --parameters onboardingSpObjectId="<spn-object-id>"

Expected: the deployment succeeds and the role assignment plus tags appear on the resource group / machine. From here, Azure Policy can enforce those tags fleet-wide — the same Modify/DeployIfNotExists effects you’d use on native VMs, covered in Azure Policy Effects Decoded: Deny vs Audit vs Modify vs DeployIfNotExists.

Architecture at a glance

Read the diagram left to right as the path a single onboarding takes. On the far left, the server you own runs the Connected Machine agent, which makes an outbound HTTPS (443) call first to Entra ID for a token — authenticating either as you (device-code, one-off) or as the onboarding service principal (unattended at scale). With the token it calls Azure Resource Manager, which — gated by the SPN’s Azure Connected Machine Onboarding role — creates the Microsoft.HybridCompute/machines resource in your chosen resource group and region, while the Hybrid Identity Service issues the machine a system-assigned managed identity.

Three things the diagram makes concrete: every arrow points outbound (Azure never connects in, so you open no inbound ports); the role assignment is the gate (no onboarding role → ARM 403, however good the egress); and once the resource exists the machine is a normal ARM resource that Azure Policy, tags, RBAC and Defender for Cloud attach to exactly as a native VM — the whole payoff. The numbered badges mark the three hops where onboarding fails — egress blocked, wrong role, clock/TLS skew — and the legend gives confirm-and-fix for each.

Left-to-right Azure Arc onboarding architecture: a self-owned Windows or Linux server running the Connected Machine agent makes outbound HTTPS 443 calls through Entra ID for a token (interactive or service-principal auth), then to Azure Resource Manager which, gated by the Azure Connected Machine Onboarding RBAC role, creates the Microsoft.HybridCompute machines resource in a target resource group and region, while the Hybrid Identity Service issues the machine a managed identity; numbered failure badges mark blocked egress, missing onboarding role, and clock or TLS skew, and downstream the Arc machine is governed by Azure Policy, tags and Defender for Cloud just like a native VM

Real-world scenario

Meridian Logistics runs a hybrid estate: about 180 servers on-prem across two data centres (Windows Server 2019 for the warehouse system, Ubuntu for routing microservices) plus 40 VMs in AWS for partner integrations. Their Azure footprint is small, but ahead of an ISO certification their auditors demanded a single inventory with patch-compliance proof and a consistent tag standard across every server, cloud or not. The platform team is three engineers; nobody wanted a second monitoring stack or a migration.

The first attempt was the classic trap: onboard a dozen servers by hand with the portal’s single-server script, device-code login on each box over RDP/SSH. It worked, but extrapolating to 220 servers — each needing an interactive login — was hopeless, and three of the twelve silently failed. The cause was the data-centre firewall: it allowed outbound 443 to most of the internet but blocked *.his.arc.azure.com, so the agent got a token but couldn’t register its identity. azcmagent show read Disconnected; azcmagent check flagged the his.arc endpoint as unreachable. One firewall allow-list fixed all three.

They then did it properly: one onboarding service principal with only the Azure Connected Machine Onboarding role, scoped to a single resource group, secret in Key Vault. The portal-generated at-scale script pulled APP_SECRET at run time and was delivered two ways — an Ansible play for the on-prem fleet and a cloud-init snippet in a new AWS launch template (the agent doesn’t care the VM runs in AWS; outbound 443 to Azure is all it needs) — staggered in waves of 40. Two issues surfaced at scale: older Windows boxes failed TLS to Entra from clock drift (an NTP sync fixed the wave), and cloned VMs sharing one hostname collided on the default resource name (fixed with an explicit --resource-name from the AWS instance ID). Within two days all 260 servers showed Connected with a uniform tag set, and an Azure Policy initiative audited tag and patch compliance fleet-wide — the same controls their Azure VMs already had. Total new infrastructure: one service principal and a Key Vault secret. The lesson they wrote down: “Onboarding is 5% agent install and 95% egress and identity.”

Advantages and disadvantages

Advantages (why Arc onboarding is worth it) Disadvantages / costs to weigh
Non-Azure servers become real ARM resources — tags, RBAC, Policy all apply You run and patch an agent on every server (a managed footprint)
One governance/inventory plane for hybrid + multicloud + edge Requires reliable outbound 443 egress to specific endpoints
Least-privilege onboarding via a dedicated service principal A leaked SPN secret can enrol machines (scope it tightly, rotate it)
Workload never moves — zero migration risk Arc enables management, not the workload itself — it’s not “lift to cloud”
Per-machine managed identity lets servers call Azure services securely Some downstream features (Defender, Update Manager) bill separately
Free to onboard and use core control-plane features Steady-state egress + extension data has a small network/cost footprint
At-scale script + IaC make hundreds of servers as easy as one Day-2 ops (extensions, policy) is a real program, not a one-off install

The advantages dominate whenever you have a meaningful non-Azure footprint and a governance mandate — most enterprises. The disadvantages bite most in locked-down networks (opening egress is itself a project) and tiny estates (one server may not justify the program, though onboarding is cheap). The agent footprint is light; the real cost is the discipline of egress, the SPN lifecycle, and the day-2 program onboarding unlocks.

Hands-on lab

You’ll onboard one server end to end, the at-scale way (with a service principal), validate from both sides, then tear it down. Use any spare Windows or Linux VM you can get admin/root on — it does not need to be in Azure. Run the Azure-side commands from Cloud Shell; run the server-side commands on the target box.

Prerequisites for the lab: an Azure subscription where you can create a resource group and a service principal (Owner or User Access Administrator on the target RG to assign the role), and a server with outbound 443 and admin/root. Estimated cost: ₹0 — onboarding and the core control plane are free; you’ll add no billable extensions.

Step 1 — Register the providers (once per subscription).

az provider register --namespace Microsoft.HybridCompute --wait
az provider register --namespace Microsoft.GuestConfiguration --wait
az provider show -n Microsoft.HybridCompute --query registrationState -o tsv

Expected: prints Registered.

Step 2 — Create the target resource group.

az group create -n rg-arc-lab -l centralindia -o table

Expected: a table row, provisioningState: Succeeded.

Step 3 — Create the onboarding service principal (least privilege).

SUB=$(az account show --query id -o tsv)
az ad sp create-for-rbac \
  --name "sp-arc-lab-onboarding" \
  --role "Azure Connected Machine Onboarding" \
  --scopes "/subscriptions/$SUB/resourceGroups/rg-arc-lab"

Expected: JSON with appId, password, tenant. Copy all three now — the password is shown once. (In production, push these straight into Key Vault.)

Step 4 — On the target server, install the agent. Linux (as root/sudo):

wget https://gbl.his.arc.azure.com/azcmagent-linux -O install_linux_azcmagent.sh
bash install_linux_azcmagent.sh
azcmagent version

Windows (PowerShell as Administrator):

Invoke-WebRequest -Uri "https://aka.ms/AzureConnectedMachineAgent" -OutFile AzureConnectedMachineAgent.msi
msiexec /i AzureConnectedMachineAgent.msi /qn

Expected: install completes; azcmagent version prints a version string.

Step 5 — Pre-flight the network (don’t skip this). The agent ships a connectivity checker — run it before connecting to catch egress problems early:

azcmagent check --location centralindia

Expected: every required endpoint shows reachable. Any unreachable row is a blocked egress you must fix before Step 6 will work.

Step 6 — Connect using the service principal (unattended). On the server, substitute the values from Step 3:

azcmagent connect \
  --service-principal-id "<appId>" \
  --service-principal-secret "<password>" \
  --tenant-id "<tenant>" \
  --resource-group "rg-arc-lab" \
  --subscription-id "<your-subscription-id>" \
  --location "centralindia" \
  --tags "env=lab,owner=vinod"

Expected: ends with Successfully onboarded resource to Azure.

Step 7 — Validate from the server.

azcmagent show

Expected: Agent Status : Connected, with the resource name, resource group, region, and a machine identity object ID listed.

Step 8 — Validate from Azure.

az connectedmachine show \
  --name "$(hostname)" \
  --resource-group rg-arc-lab \
  --query "{name:name, status:status, os:osName, agent:agentVersion}" -o table

Expected: a row with status: Connected. Open Azure Arc → Machines in the portal — the server is listed in rg-arc-lab with your env=lab and owner=vinod tags.

Validation checklist — what each step proved:

Step What you did What it proves
1 Registered providers The subscription can host Arc resources
3 Created a least-privilege SPN Onboarding works without a human login
5 azcmagent check Egress is the real prerequisite — verified first
6 azcmagent connect with SPN The unattended, at-scale auth path
7–8 show both sides The resource exists and the agent agrees it’s connected

Step 9 — Teardown (clean from both sides). Disconnect on the server, then delete the Azure resources:

# On the server: remove the agent's connection (deletes the Arc resource too)
azcmagent disconnect

# From Azure: delete the resource group and the service principal
az group delete -n rg-arc-lab --yes --no-wait
az ad sp delete --id "<appId>"

Expected: azcmagent disconnect reports success and the machine leaves Arc → Machines; the resource group and SPN are removed. Optionally uninstall the agent itself (apt remove azcmagent / remove the MSI) if you’re done with the box.

Cost note. This lab is free — onboarding, the agent and the core control plane carry no charge, and it touches no billable extensions (see Cost & sizing below).

Common mistakes & troubleshooting

These are the failures you’ll actually meet, with the exact command or portal path that confirms each. Scan the table, then read the detail for your row.

# Symptom Root cause Confirm (exact cmd / path) Fix
1 connect fails; can’t reach an endpoint Outbound 443 to an Arc endpoint blocked azcmagent check --location <region> shows unreachable Allow-list the endpoint / set proxy
2 connect fails with 403 / authorization SPN lacks Onboarding role at that scope az role assignment list --assignee <appId> -o table Assign Azure Connected Machine Onboarding on the RG
3 Provider-not-registered error Resource providers not registered az provider show -n Microsoft.HybridCompute --query registrationState az provider register --namespace ... --wait
4 TLS / certificate error to login.microsoftonline.com System clock skew Check server time vs real time Sync NTP / time service
5 Auth fails on the at-scale run SPN secret expired az ad sp credential list --id <appId> Rotate the secret; update the script’s source
6 Two servers collide on one Arc name Duplicate hostnames (cloned VMs) az connectedmachine list -g <rg> -o table (name clash / overwrite) Pass --resource-name <unique>
7 connect says already connected Agent already onboarded azcmagent showConnected azcmagent disconnect first, or skip (make script idempotent)
8 Status Disconnected after a while Agent can’t reach Azure (egress dropped / service stopped) azcmagent show; check the agent service is running Restart the agent service; re-check egress
9 connect via proxy fails Proxy not configured on the agent azcmagent config list azcmagent config set proxy.url http://proxy:8080
10 “Insufficient privileges” creating the SPN Your user can’t create app registrations or assign roles az ad sp create-for-rbac errors Get an admin to create the SPN, or gain the rights

The three that cause the most wasted time, in detail:

Blocked egress (#1) is the number-one failure. The agent often authenticates fine, then stalls because a required endpoint — commonly *.his.arc.azure.com or *.guestconfiguration.azure.com — is firewalled. Run azcmagent check --location <region> to see exactly which endpoint is unreachable, then allow-list it on outbound 443 (or set azcmagent config set proxy.url ...). Never open inbound ports — Arc is outbound-only.

A 403 at connect (#2) is always the role, not the network. The agent authenticated but ARM refused to create the resource because the identity lacks Azure Connected Machine Onboarding at the target scope. Confirm with az role assignment list --assignee <appId> --all -o table; the usual cause is the role assigned at the wrong scope (a different RG or subscription). Assign it on the correct resource group and retry.

Clock skew (#4) breaks TLS to Entra. A server whose clock has drifted by more than a few minutes fails certificate validation against login.microsoftonline.com — so the connect dies before it even reaches ARM. This disproportionately hits long-uptime and cloned VMs; sync NTP / the Windows Time service and retry. The remaining rows (expired SPN secret, duplicate hostnames, post-onboarding Disconnected) follow the same shape — the table’s confirm-command tells you which.

Best practices

Security notes

Cost & sizing

The good news for a Basic guide: onboarding itself is free, and so is the core Arc control plane. Creating the HybridCompute/machines resource, tagging it, granting RBAC, and evaluating Azure Policy against it carry no charge. The agent is free software. So a fleet of 10 or 10,000 servers can be Arc-enabled at zero control-plane cost.

Costs appear only when you light up billable add-ons on top of onboarded machines — and those are choices you make after onboarding, not part of it:

Cost driver What it is Roughly when it bills Notes
Onboarding + control plane Create resource, tags, RBAC, policy eval Free The subject of this article
Defender for Servers (via Arc) Defender for Cloud plan on the machine Per server/hour when enabled Optional security posture/EDR
Azure Update Manager Managed patch orchestration Per managed server (where applicable) Optional patching at scale
Log Analytics ingestion Logs/metrics sent to a workspace Per GB ingested + retention Driven by the monitoring extension
Extensions’ own egress Steady-state data the agent/extensions send Tiny network footprint Negligible for the agent alone; grows with monitoring
Arc-enabled “guest” features e.g. SQL Server on Arc, Kubernetes on Arc Vary by feature Out of scope here; priced separately

There’s nothing to “size” for onboarding — the agent is light (modest CPU/RAM/disk), identical on a 2-vCPU box or a 64-core server. The decisions that cost money are downstream: how many machines you put Defender for Servers on, how much you ingest into Log Analytics (use data collection rules to send only what you need), and whether you adopt Update Manager. The sensible pattern is to onboard everything (free), then enable billable plans selectively where the need justifies it — and tag per Azure Tagging Strategy 101: A Naming and Tag Schema for Cost Allocation so that spend is attributable.

Interview & exam questions

1. What does Azure Arc actually do for a non-Azure server? It projects the server into Azure’s control plane as a Microsoft.HybridCompute/machines ARM resource without moving the workload, so the machine gains a resource ID and can be tagged, governed with Azure Policy, granted RBAC, and onboarded to services like Defender for Cloud and Update Manager — exactly like a native Azure VM.

2. What is the Connected Machine agent and what are its responsibilities? The azcmagent-based agent installed on each server. It registers the machine (creating the ARM resource), maintains a system-assigned managed identity for the server, and serves as the landing point for Azure extensions. It is not a remote-shell or remote-desktop tool.

3. Which RBAC role does the onboarding service principal need, and why not a broader one? Azure Connected Machine Onboarding — it can create and read Arc machines and nothing else. A broader role (e.g. Resource Administrator, which can delete machines and run extensions) would give a leaked onboarding secret far more reach, violating least privilege.

4. Why use a service principal for at-scale onboarding instead of interactive login? Interactive (device-code) auth requires a human to approve each enrolment in a browser, which is impossible across hundreds of servers. A service principal authenticates non-interactively (--service-principal-id/-secret/--tenant-id), so the install runs fully unattended from configuration management, golden images, or startup scripts.

5. What network connectivity does onboarding require? Outbound TCP 443 from the server to a defined set of Azure endpoints (Entra ID, ARM, the Hybrid Identity Service, guest configuration, etc.). Azure never connects inbound, so no inbound ports are opened. azcmagent check validates reachability before you connect.

6. A connect fails with a 403 even though connectivity is fine. What’s wrong? The identity (your login or the SPN) lacks the Azure Connected Machine Onboarding role at the target scope, so ARM refuses to create the resource. Confirm with az role assignment list --assignee <id> and assign the role on the correct resource group/subscription.

7. Onboarding fails with a TLS/certificate error to login.microsoftonline.com. Likely cause? The server’s system clock is skewed beyond the tolerance for certificate validation. Sync NTP/the Windows Time service and retry. This commonly hits cloned or long-uptime VMs.

8. You cloned 20 VMs from one image and only some appear in Arc. Why? They share a hostname, and Arc names the resource after the hostname by default, so the clones collide/overwrite. Fix by ensuring unique hostnames or passing an explicit --resource-name.

9. After onboarding works, a machine shows Disconnected. How do you diagnose it? Check azcmagent show and the agent service on the server, and confirm steady-state egress still reaches Azure (azcmagent check). Either the agent service stopped or a firewall change blocked outbound 443; restart the service and/or restore egress.

10. Does Arc onboarding cost money? No — onboarding and the core control plane (resource creation, tags, RBAC, Policy evaluation) are free. Costs come only from optional, separately-billed add-ons enabled afterward, such as Defender for Servers, Update Manager managed updates, and Log Analytics ingestion.

These map to AZ-800/AZ-801 (Windows Server Hybrid Administrator)manage servers and workloads in a hybrid environment, Azure Arc onboarding and management — and AZ-104 (Administrator) for the RBAC, resource-group and tagging fundamentals. The identity/service-principal angle touches AZ-500 (Security Engineer).

Question theme Primary cert Objective area
What Arc is, agent responsibilities AZ-800/801 Manage hybrid servers with Arc
Onboarding role, service principal AZ-800/801 / AZ-500 Onboard and secure Arc machines
RBAC, resource groups, tags AZ-104 Manage identities, governance, resources
Network egress, private link AZ-800/801 Hybrid connectivity
Policy/monitoring on Arc machines AZ-104 / AZ-800 Govern and monitor at scale

Quick check

  1. Does onboarding a server to Azure Arc move or migrate the workload into Azure? What does it actually create?
  2. Which built-in RBAC role should the onboarding service principal hold, and why not a broader one?
  3. Your at-scale azcmagent connect fails on every box this morning after working yesterday. Name the two most likely causes.
  4. What command do you run on the server before connect to catch egress problems, and what does it test?
  5. Twenty cloned VMs were onboarded but only a few show up in Arc. What’s the cause and the fix?

Answers

  1. No — it does not move the workload. It creates a Microsoft.HybridCompute/machines ARM resource that projects the server into Azure’s control plane (giving it a resource ID, tags, RBAC, and policy reach). The server keeps running exactly where it is.
  2. Azure Connected Machine Onboarding — it can only create and read Arc machines. A broader role (e.g. Resource Administrator) would let a leaked onboarding secret delete machines or run extensions, breaking least privilege.
  3. Either the service principal secret expired (confirm with az ad sp credential list --id <appId>) or a firewall change blocked outbound 443 to an Arc endpoint (confirm with azcmagent check). Rotate the secret or restore egress respectively.
  4. azcmagent check --location <region> — it tests reachability of each required Arc endpoint over outbound 443 and prints reachable/unreachable, so you catch a blocked endpoint before the connect fails confusingly.
  5. They share the same hostname (cloned from one image), and Arc names the resource after the hostname by default, so they collide/overwrite. Fix by ensuring unique hostnames or passing an explicit --resource-name per machine.

Glossary

Next steps

You can now project any server into Azure and enrol a whole fleet unattended. Build outward:

AzureAzure ArcConnected Machine AgentHybridService PrincipalazcmagentOnboardingGovernance
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading