Ansible for SAP, In Depth — HANA System Replication, NetWeaver, Kernel Patching and Landscape Automation

SAP is the production landscape that scares most automation teams. The runbooks read like vendor documentation written in 2002, the OS prerequisites are five pages long, and a single mis-tuned kernel parameter manifests as a six-hour HANA recovery. Ansible, used carefully, makes it tractable: every SAP Note becomes a role, every SID a host group, every transport a job template, every patch wave a workflow.

This lesson is the specialist guide to automating the SAP basis function with Ansible: preparing OS hosts to SAP standards, installing HANA scale-up and scale-out, configuring HANA system replication (HSR) with Pacemaker, deploying NetWeaver ABAP/Java instances, running kernel and SPS patches, importing transports, and running the whole landscape from AAP without becoming the bottleneck for every basis change request.

We will be opinionated. The Red Hat-supported sap_install and sap_hana_install collections are the path that keeps you out of trouble; we will use them. The Linux Pacemaker stack is the cluster that SAP itself documents; we will use it. The vendor-specific stacks (HP Serviceguard, Veritas) exist but do not integrate with Ansible cleanly; we will not cover them.

Position in the curriculum. Tier 1–4 fluency required, plus the Tier 5 compliance, DR, and migrations lessons. SAP environments are usually all three combined: regulated (SOX/PCI), high-availability/DR-critical, and constantly migrating between hardware refreshes and HANA SPS upgrades.

What “SAP automation” really covers

SAP basis is a small team supporting a sprawl of systems. The work splits into four buckets:

OS preparation per SAP Notes: kernel parameters, swap, transparent huge pages, NUMA topology, filesystem layouts, NTP/chrony, SELinux rules, network MTUs. Driven by SAP Notes (e.g., 2009879 for RHEL 7, 2235581 for RHEL 8, 3108316 for RHEL 9). Every Note has a hundred line items.
Database installation and patching: HANA installs, SPS upgrades, kernel patching, parameter tuning, HSR setup, backup/recovery configuration.
Application server installation and patching: NetWeaver ABAP and Java stacks, kernel patches, transport imports, profile parameter management.
Landscape orchestration: refresh of QA from production, system copy automation, transport imports across the landscape (DEV → QA → PRD), service start/stop coordination during patching.

Ansible handles all four; the key is to use the SAP-supported collections rather than reinventing the wheel. Three collections matter:

redhat.sap_install — installs and configures SAP NetWeaver (ABAP + Java).
redhat.sap_hana_install — installs SAP HANA scale-up and scale-out.
community.sap_install — community-maintained variants and helpers.
redhat.sap_management — operations: start/stop, kernel patching, HSR setup, RHEL HA cluster.

These collections wrap sapinst, hdblcm, pcs, and the various SAP CLI tools, and bake in the conditionals from the SAP Notes. The collections are kept current with SAP releases by Red Hat — using them is how you avoid building a 5,000-line in-house “sap_role” that breaks every quarter.

A representative SAP landscape

For concreteness, the rest of this lesson assumes a typical mid-size production landscape:

Three-tier HANA scale-up landscape: DEV (single-host HANA + ASCS/PAS/AAS NetWeaver), QA (mirror of PRD with 1/4 sizing), PRD (HANA HSR with two nodes + Pacemaker, ASCS/ERS HA cluster, four PAS+AAS dialog instances).
SID conventions: HDB system ID for HANA (e.g., S4P = production S/4HANA, S4Q = QA, S4D = DEV).
OS: RHEL 9 with SAP HA solutions repo enabled (rhel-9-for-x86_64-sap-solutions-rpms).
Storage: NetApp ONTAP for /hana/data, /hana/log, /hana/shared; XFS on top.
Network: separate NICs for client traffic, HSR replication, internal NetWeaver, and admin.
Identity: SAP service users (<sid>adm, sapadm, sapsys) created consistently across hosts.
DR: HANA HSR cross-site, Pacemaker takeover, virtual IP failover.

This landscape is what 80% of enterprise SAP shops look like; the Ansible patterns scale to scale-out HANA and to S/4HANA Cloud private edition with minor adjustments.

The SAP repository layout

sap-automation/
├── ansible.cfg
├── collections/requirements.yml      # redhat.sap_install,
│                                     # redhat.sap_hana_install,
│                                     # redhat.sap_management,
│                                     # community.sap_install,
│                                     # ansible.posix, community.general
├── inventory/
│   ├── prd/                           # production
│   │   └── hosts.yml                   #   hana_primary, hana_secondary, ascs, ers, pas, aas
│   ├── qa/                            # quality assurance
│   └── dev/                           # development
├── group_vars/
│   ├── all/
│   │   ├── sap_landscape.yml           # SID, instance numbers, DB connect strings
│   │   └── vault.yml                   # sap_installer_password, hana_master_pw, sapsys gid
│   ├── hana/
│   │   ├── sap_hana_install.yml
│   │   └── hana_hsr.yml
│   ├── netweaver/
│   │   ├── sap_install.yml
│   │   └── ascs_ers_cluster.yml
│   └── all_sap/
│       └── os_prep.yml                 # SAP Notes-driven kernel/sysctl/limits/THP
├── playbooks/
│   ├── 00-os-prep.yml
│   ├── 10-hana-install-primary.yml
│   ├── 11-hana-install-secondary.yml
│   ├── 12-hana-hsr-enable.yml
│   ├── 13-hana-pacemaker.yml
│   ├── 20-netweaver-ascs-install.yml
│   ├── 21-netweaver-ers-install.yml
│   ├── 22-netweaver-pas-install.yml
│   ├── 23-netweaver-aas-install.yml
│   ├── 30-kernel-patch.yml
│   ├── 31-hana-sps-patch.yml
│   ├── 40-system-refresh-prd-to-qa.yml
│   ├── 50-transport-import.yml
│   └── 99-decommission.yml
└── roles/
    ├── sap_os_prep/                    # wraps SAP Notes per RHEL major version
    ├── sap_storage_layout/
    ├── sap_users_groups/
    ├── sap_install_media_stage/
    ├── sap_hana_install_wrapper/       # wraps redhat.sap_hana_install with our defaults
    ├── sap_hana_hsr/
    ├── sap_pacemaker_cluster/
    ├── sap_netweaver_install_wrapper/
    ├── sap_kernel_patch/
    ├── sap_transport_import/
    ├── sap_validate/
    └── sap_evidence/

The split is deliberate: small playbooks per phase (so a rerun is cheap), wrapper roles around the upstream collections (so we can layer in our defaults), and shared roles for OS prep, storage and users (so DEV/QA/PRD are built from the same code).

OS preparation per SAP Notes

Every SAP installation begins with OS preparation. The relevant SAP Notes for RHEL 9 are 3108316 (general), 2002167 (NetApp NFS for HANA), 2382421 (Linux kernel parameters), and 1771258 (Linux NUMA layout). Each Note is a list of conditionals; together they fill several pages.

Rather than transcribe Notes manually, use redhat.sap_install.sap_general_preconfigure (and its sibling roles) which encode the Notes for you:

# playbooks/00-os-prep.yml
---
- name: SAP OS preparation (per SAP Notes)
  hosts: all_sap
  become: true
  collections:
    - redhat.sap_install
  roles:
    - role: sap_general_preconfigure
      vars:
        sap_general_preconfigure_modify_etc_hosts: false   # we manage hosts via dnsmasq/cloud DNS
        sap_general_preconfigure_kernel_parameters_2382421: true
        sap_general_preconfigure_min_swap_space: 20480     # MB

    - role: sap_hana_preconfigure
      when: "'hana' in group_names"
      vars:
        sap_hana_preconfigure_kernel_parameters_NetApp: true
        sap_hana_preconfigure_thp: never
        sap_hana_preconfigure_numa_balancing: 0

    - role: sap_netweaver_preconfigure
      when: "'netweaver' in group_names"

Under the hood these roles set:

kernel.shmmax, kernel.shmall to large values.
vm.swappiness=10, vm.dirty_ratio=10, vm.dirty_background_ratio=3.
transparent_hugepage=never (HANA explicitly hates THP).
tuned-adm profile sap-hana (or sap-netweaver).
numa balancing disabled on HANA hosts.
File limits in /etc/security/limits.d/99-sap.conf for <sid>adm.
Selected SELinux booleans per SAP Note 2456406.

The tuned profiles are the most opinionated piece; they tune the entire kernel for SAP workloads. Run tuned-adm active after to verify.

Storage layout

HANA has strict storage layout requirements. Every basis team has a story about a single mis-laid filesystem that took down a quarterly close. Use a dedicated role and lock the values down:

# roles/sap_storage_layout/tasks/main.yml
---
- name: Create HANA volume groups (LVM)
  community.general.lvg:
    vg: "vg_hana_{{ item.name }}"
    pvs: "{{ item.pvs }}"
  loop:
    - { name: "data",   pvs: "/dev/disk/by-id/{{ data_disk }}" }
    - { name: "log",    pvs: "/dev/disk/by-id/{{ log_disk }}" }
    - { name: "shared", pvs: "/dev/disk/by-id/{{ shared_disk }}" }

- name: Create HANA logical volumes
  community.general.lvol:
    vg: "vg_hana_{{ item.vg }}"
    lv: "lv_{{ item.name }}"
    size: "{{ item.size }}"
    state: present
  loop:
    - { vg: data,   name: data,   size: "{{ hana_data_size }}" }
    - { vg: log,    name: log,    size: "{{ hana_log_size }}" }
    - { vg: shared, name: shared, size: "{{ hana_shared_size }}" }

- name: Create HANA filesystems (XFS, 4K)
  community.general.filesystem:
    fstype: xfs
    dev: "/dev/mapper/vg_hana_{{ item.vg }}-lv_{{ item.name }}"
    opts: "-f -K -d agcount=64"
  loop: "{{ hana_lvs }}"

- name: Mount HANA filesystems
  ansible.posix.mount:
    src: "/dev/mapper/vg_hana_{{ item.vg }}-lv_{{ item.name }}"
    path: "{{ item.path }}"
    fstype: xfs
    opts: "noatime,inode64,nobarrier"
    state: mounted
  loop:
    - { vg: data,   name: data,   path: "/hana/data/{{ sid }}" }
    - { vg: log,    name: log,    path: "/hana/log/{{ sid }}" }
    - { vg: shared, name: shared, path: "/hana/shared" }

The mount options matter: inode64 for large filesystems, noatime for performance, nobarrier only when underlying storage has battery-backed write cache (NetApp, modern enterprise SAN — never for cloud EBS without io2 Block Express). Read SAP Note 1944799 before deviating.

For cloud SAP (AWS X1/X2, Azure M-series, GCP m2/m3), the corresponding native disk products replace the LVM steps but the mount options stay the same.

HANA scale-up install

redhat.sap_hana_install wraps hdblcm with the exact answer-file format SAP expects. The role accepts a structured set of vars; you do not edit hdblcm arguments directly.

# playbooks/10-hana-install-primary.yml
---
- name: Install HANA on primary node
  hosts: hana_primary
  become: true
  collections:
    - redhat.sap_hana_install
  vars:
    sap_hana_install_software_directory: /sapmedia/HANA_2_SPS07
    sap_hana_install_sid: "{{ sid }}"
    sap_hana_install_instance_number: "00"
    sap_hana_install_master_password: "{{ vault_hana_master_password }}"
    sap_hana_install_use_master_password_for_users: true
    sap_hana_install_system_usage: production
    sap_hana_install_apply_license: true
    sap_hana_install_license_path: /sapmedia/license/license-{{ sid }}.txt
    sap_hana_install_components:
      - server
      - client
      - studio
      - xs
  roles:
    - role: sap_install_media_stage      # ensures HANA media is on host (NFS/scratch)
    - role: sap_hana_install_wrapper      # invokes redhat.sap_hana_install role
    - role: sap_hana_post_install         # license, audit, parameters

The most frequent install bug is media that is not fully extracted; sap_install_media_stage should verify the SAR/EXE checksums and unpack to a known location before HANA install begins.

The post-install role applies parameters (global.ini, indexserver.ini) using redhat.sap_management.sap_hana_set_parameters. Common defaults set:

[persistence] log_mode = normal
[multidb] mode = multidb (for MDC)
[memorymanager] global_allocation_limit (per SAP sizing report)
[trace] default = info

HANA System Replication (HSR)

HSR is the critical-path DR feature of HANA. It replicates redo logs from primary to secondary in three modes:

Sync — primary commits only after secondary acknowledges. RPO=0, performance hit.
SyncMem — secondary acknowledges in memory, no fsync. RPO ≈ 0, less performance hit.
Async — secondary acknowledges after receive. RPO seconds, no performance hit.

For two-DC same-region HSR, use SyncMem; for cross-region, use Async with a clear understanding that some transactions will be lost on a forced failover.

Ansible automates HSR setup once HANA is installed on both nodes:

# playbooks/12-hana-hsr-enable.yml
---
- name: Enable HSR primary
  hosts: hana_primary
  become: true
  tasks:
    - name: Enable system replication on primary
      ansible.builtin.command:
        cmd: |
          su - {{ sid_lc }}adm -c "
            hdbnsutil -sr_enable --name=DC1
          "
      register: enable
      changed_when: "'successfully enabled' in enable.stdout"

- name: Enable HSR secondary (after key copy)
  hosts: hana_secondary
  become: true
  tasks:
    - name: Stop HANA on secondary (must be down for register)
      ansible.builtin.command:
        cmd: su - {{ sid_lc }}adm -c "HDB stop"

    - name: Copy SSFS keys from primary to secondary
      ansible.posix.synchronize:
        src: "/usr/sap/{{ sid }}/SYS/global/security/rsecssfs/"
        dest: "/usr/sap/{{ sid }}/SYS/global/security/rsecssfs/"
      delegate_to: "{{ groups['hana_primary'][0] }}"

    - name: Register secondary with primary
      ansible.builtin.command:
        cmd: |
          su - {{ sid_lc }}adm -c "
            hdbnsutil -sr_register
              --remoteHost={{ hostvars[groups['hana_primary'][0]].ansible_host }}
              --remoteInstance={{ instance_number }}
              --replicationMode=syncmem
              --operationMode=logreplay
              --name=DC2
          "
      register: register

    - name: Start HANA on secondary
      ansible.builtin.command:
        cmd: su - {{ sid_lc }}adm -c "HDB start"

- name: Verify replication is in sync
  hosts: hana_primary
  become: true
  tasks:
    - name: Query HSR state
      ansible.builtin.command:
        cmd: |
          su - {{ sid_lc }}adm -c "
            python /usr/sap/{{ sid }}/HDB{{ instance_number }}/exe/python_support/systemReplicationStatus.py
          "
      register: hsr_state
      retries: 30
      delay: 10
      until: "'Replication mode' in hsr_state.stdout and 'OPERATION MODE' in hsr_state.stdout"

The systemReplicationStatus.py query is the canonical “is HSR healthy?” check; the role retries until the status line shows the secondary as ACTIVE. Wire this into a periodic redhat.sap_management.sap_hana_check_hsr_status job in AAP that runs every 5 minutes and pages on degradation.

Pacemaker cluster for HANA HSR

The official Linux HA cluster for HANA on RHEL is Pacemaker with the SAPHana and SAPHanaTopology resource agents. redhat.sap_management.sap_ha_pacemaker_cluster wraps the entire setup:

# playbooks/13-hana-pacemaker.yml
---
- name: HANA Pacemaker HA
  hosts: hana_cluster
  become: true
  collections:
    - redhat.sap_management
  vars:
    sap_ha_cluster_node_list:
      - "{{ groups['hana_primary'][0] }}"
      - "{{ groups['hana_secondary'][0] }}"
    sap_ha_cluster_authkey: "{{ vault_pcsd_password }}"
    sap_ha_cluster_hacluster_password: "{{ vault_hacluster_password }}"
    sap_ha_cluster_resource_stonith: fence_aws    # or fence_vmware_rest, fence_ipmilan
    sap_ha_cluster_resource_vip: "{{ hana_virtual_ip }}"
    sap_ha_cluster_resource_vip_secondary: "{{ hana_secondary_vip }}"
    sap_ha_cluster_sid: "{{ sid }}"
    sap_ha_cluster_instance_number: "{{ instance_number }}"
  roles:
    - sap_ha_install_pacemaker
    - sap_ha_pacemaker_cluster
    - sap_ha_install_hana_hsr_angi    # ANGI = Active Next Generation Implementation

The cluster:

Runs SAPHanaTopology on every node, which monitors HANA and reports state to Pacemaker.
Runs the SAPHana primitive in master/slave mode; the master is the primary, the slave is the secondary.
Floats a virtual IP on the primary node — clients connect to the VIP, not to a hostname per node.
Includes a STONITH agent (fence_aws for EC2, fence_vmware_rest for vSphere, fence_ipmilan for bare-metal). Without STONITH, the cluster will not start; do not disable it “for testing.”

The cluster takeover behaviour you actually care about:

Primary node failure — Pacemaker detects, fences the dead node, promotes secondary, moves VIP. Application reconnects within ~30s.
HSR replication failure — Pacemaker does NOT take over; it raises a flag. Manual intervention (re-establish HSR) required.
Network partition — STONITH ensures only one node remains, preventing split brain.
Manual operator-driven takeover — pcs resource move triggers a controlled failover; useful for kernel patching.

NetWeaver ASCS/ERS HA

NetWeaver also needs HA. The ASCS instance (Application Server Central Services) and the ERS instance (Enqueue Replication Server) form a clustered pair. They share NFS-mounted directories (/sapmnt, /usr/sap/<SID>/ASCS<NN>).

# playbooks/20-netweaver-ascs-install.yml
- hosts: ascs
  become: true
  collections:
    - redhat.sap_install
  vars:
    sap_swpm_inifile_list:
      - PRD-NW-ASCS-INI
    sap_swpm_template_inifile: ascs.params.j2
  roles:
    - sap_install_media_stage
    - sap_install                 # wraps SWPM (sapinst)

For the cluster:

# playbooks/21-netweaver-cluster.yml
- hosts: ascs_ers_cluster
  become: true
  collections:
    - redhat.sap_management
  vars:
    sap_ha_cluster_resource_stonith: fence_vmware_rest
    sap_ha_cluster_resource_vip: "{{ ascs_vip }}"
    sap_ha_cluster_resource_vip_ers: "{{ ers_vip }}"
    sap_ha_cluster_sid: "{{ sid }}"
    sap_ha_cluster_ascs_instance_number: "00"
    sap_ha_cluster_ers_instance_number: "10"
  roles:
    - sap_ha_install_pacemaker
    - sap_ha_pacemaker_cluster_nw

The Pacemaker resources for NetWeaver:

SAPInstance for ASCS, with START_PROFILE pointing at the central services profile.
SAPInstance for ERS, with the enqueue replicator profile.
A constraint that ASCS and ERS prefer different nodes (location rule with score=-INFINITY).
A virtual IP for ASCS and one for ERS.
An NFS shared filesystem resource (or NFS-handled-externally if using EFS/Azure Files).

Without ERS replication, an ASCS failover loses every uncommitted lock, which translates to user-visible “wait, I need to redo my entry” errors. ERS keeps the lock table mirrored.

Dialog instances (PAS, AAS)

The PAS (Primary Application Server) and AAS (Additional Application Server) are not clustered; they are scaled out behind the SAP Web Dispatcher or the load balancer. Ansible installs each instance idempotently:

# playbooks/22-netweaver-pas-install.yml
- hosts: pas
  become: true
  collections:
    - redhat.sap_install
  vars:
    sap_swpm_inifile_list:
      - PRD-NW-PAS-INI
  roles:
    - sap_install_media_stage
    - sap_install

# playbooks/23-netweaver-aas-install.yml
- hosts: aas
  become: true
  collections:
    - redhat.sap_install
  vars:
    sap_swpm_inifile_list:
      - PRD-NW-AAS-INI
  roles:
    - sap_install_media_stage
    - sap_install

The point is repeatability: when you scale out a fifth dialog instance during peak season, the playbook is the same one that built the first four. No “manual install with screenshots” runbook.

Kernel patching (SAP kernel, not Linux kernel)

The SAP kernel ships separately from the application stack. Patching is a quarterly-or-better activity. The pattern with Ansible:

# playbooks/30-kernel-patch.yml
- name: SAP kernel patch
  hosts: sap_landscape
  become: true
  serial: 1                 # one host at a time, controlled
  collections:
    - redhat.sap_management
  tasks:
    - import_role:
        name: sap_kernel_patch
      vars:
        sap_kernel_patch_target_kernel: 7.94    # canonical version
        sap_kernel_patch_media_dir: /sapmedia/kernel/7.94
        sap_kernel_patch_pre_check: true
        sap_kernel_patch_backup_old: true
        sap_kernel_patch_post_validate: true

The sap_kernel_patch role:

Stops the instance gracefully (stopsap or systemd target).
Backs up /sapmnt/<SID>/exe and /usr/sap/<SID>/SYS/exe.
Extracts the new kernel SAR files via SAPCAR.
Updates symlinks.
Runs saproot.sh.
Starts the instance.
Validates with disp+work -V and a basic transaction (SE38 test program via SAP RFC if RFC creds available).

Patching is serial: 1 because you patch one instance at a time across the landscape — never simultaneously across an HA pair. For ASCS/ERS, you patch the ERS first (drains locks), failover, patch the ASCS, fail back. For PAS/AAS, you patch one dialog at a time so the load balancer drains and re-adds it cleanly.

Transport imports

SAP changes (development objects, configuration, customising) move across the landscape as transports: dev-side tp exports, QA-side imports, then PRD imports. Ansible can drive imports cleanly:

# playbooks/50-transport-import.yml
- hosts: ascs              # CI host runs from the Application Server side
  become_user: "{{ sid_lc }}adm"
  collections:
    - community.sap_install
  vars:
    transport_request: "PRDK900153"
    transport_target_system: PRD
  tasks:
    - name: Add request to import queue
      ansible.builtin.command:
        cmd: |
          tp addtobuffer {{ transport_request }} {{ transport_target_system }}
            -Dpf=/usr/sap/trans/bin/TP_DOMAIN_PRD.PFL
      register: add_buffer

    - name: Import the request
      ansible.builtin.command:
        cmd: |
          tp import {{ transport_request }} {{ transport_target_system }}
            client=100
            -Dpf=/usr/sap/trans/bin/TP_DOMAIN_PRD.PFL
            U126   # ignore inactive imports as needed
      register: tp_import
      failed_when: tp_import.rc not in [0, 4]   # 0 = clean, 4 = warnings

    - name: Persist tp logs
      ansible.builtin.fetch:
        src: "/usr/sap/trans/log/{{ transport_request }}.{{ transport_target_system }}"
        dest: "./tp-logs/"
        flat: true

The interesting AAP-level orchestration is the change ticket gate: a survey job template asks for a CHG ticket number, calls ServiceNow to verify the ticket is in “Implement” state, and then dispatches the transport import. The ITSM lesson covers this pattern in detail.

System refresh (PRD → QA)

System refresh is the periodic operation of “make QA look like PRD again.” It is high-risk and historically very manual. Ansible automates the whole flow:

Pre-refresh export of QA-only data (test users, test customising) that you want to preserve.
HANA backup on PRD → restore on QA via hdbsql and recoverSys.py.
Post-refresh import of QA-only data.
Customising adjustments for QA (e.g., disable email outputs, redirect printers, mask sensitive tables).
Smoke test with predefined transactions.

The Ansible role for this is large but the structure is simple: it is just a long playbook with many tasks, each guarded by tags so a partial re-run is feasible. The single most important rule: the refresh must be idempotent on retry, because the first attempt almost always finds something the team forgot.

Validation: synthetic transactions, not just service checks

After every patch, install, or HSR operation, run a synthetic SAP transaction. The community.sap_install.sap_rfc_call module wraps PyRFC to call any RFC-enabled function module:

- name: Smoke test — RFC ping
  community.sap_install.sap_rfc_call:
    sap_host: "{{ ascs_vip }}"
    sap_sysnr: "00"
    sap_client: "100"
    sap_user: "{{ vault_smoke_test_user }}"
    sap_password: "{{ vault_smoke_test_password }}"
    function: "STFC_CONNECTION"
    parameters:
      REQUTEXT: "ansible-{{ ansible_date_time.epoch }}"
  register: rfc
  no_log: true

- name: Assert RFC works
  ansible.builtin.assert:
    that: rfc.return_value.ECHOTEXT == "ansible-{{ ansible_date_time.epoch }}"

STFC_CONNECTION is the canonical “round-trip” RFC; it proves the entire stack — from VIP to dispatcher to work process — is functional. After kernel patches, run this. After HSR takeover, run this. After every transport import, run this.

SAP-on-Cloud specifics

Ansible is the same; cloud-specific roles change. AWS, Azure, and GCP each have a “SAP on cloud” reference architecture that Ansible automates:

AWS: X1/X2/u-* instances for HANA; EFS or FSx ONTAP for /sapmnt; AWS Backup for HANA backup; Route 53 for VIP failover (often aws cli from a Pacemaker resource agent).
Azure: M-series VMs; ANF (Azure NetApp Files) for /hana/data and /sapmnt; Azure Site Recovery for DR; Azure Load Balancer with floating IPs.
GCP: m2/m3 instances; Filestore for /sapmnt; GCS for backups; internal TCP load balancer for VIPs.

Ansible roles wrap the cloud-specific provisioning steps; the SAP-specific roles (HANA install, NetWeaver, kernel patch) are unchanged. This is the pattern that makes Ansible a good fit: cloud-specific bottom layer + cloud-agnostic SAP layer on top.

For air-gapped SAP (defence sector, highly regulated banking core), SAP runs on private infrastructure and the air-gap discipline from the previous lesson applies: HANA media, kernel SAR files, and SPS bundles are all imported via signed bundles.

Anti-patterns that destroy SAP automation

Hand-edited global.ini after Ansible runs. Drift accumulates; the next Ansible run might silently revert your one-off fix. Use redhat.sap_management.sap_hana_set_parameters always.
Skipping sapinit and sapcontrol checks before reboot. A reboot during HANA recovery is hours of pain.
Using serial: 100% for kernel patches. All instances down at once.
No fence agent in Pacemaker. STONITH is mandatory; without it the cluster cannot guarantee single-master.
Manual SSH key copies for HSR. Use the role; key drift means your secondary cannot register.
Treating <sid>adm as just-another-user. It has very specific limits, profile, environment. Always create via the SAP role.
Mixing kernel patch with SPS upgrade in one window. Each is enough work; combining them quadruples your debug surface.
Untested HSR takeover. A clean install with HSR doesn’t mean takeover works. Test it quarterly.
Backups on the same storage as data. A NetApp aggregate failure takes both. Use a separate aggregate or replicate to a different site.

Frequently asked questions

1. Can I install HANA without using redhat.sap_hana_install? Technically yes, but you will reinvent the answer-file generation, license application, and post-install parameter setting. The collection encodes thousands of lines of basis knowledge; not using it is a guarantee of bugs.

2. What’s the right HANA replication mode for my landscape? Same DC: Sync (RPO=0, performance hit) or SyncMem (RPO≈0). Cross-region: Async (some loss possible on forced takeover). Hybrid (HSR as DR): Async with a 5-min lag alarm and a tested forced-takeover playbook.

3. How do I patch the Linux kernel under HANA? Run redhat.sap_management.sap_hana_set_takeover to controlled-failover to secondary. Patch primary’s Linux kernel. Reboot. Re-establish HSR. Failback. This is one of the rarer-used routines and benefits the most from rehearsal.

4. Can Ansible handle SAP transport routes (TMS configuration)? Yes, via tp CLI calls in playbooks. The TMS GUI is one-time configuration; routine transport imports are the recurring work, and they automate cleanly.

5. What about SAP S/4HANA Cloud (private edition)? Same patterns. The customer still owns the infrastructure layer; Ansible automates HANA install, NetWeaver install, kernel patches, transports. SAP only operates the “managed services” wrapper above your stack.

6. How do I integrate Ansible with Solution Manager (SolMan)? SolMan can import job execution history via RFC; you pump Ansible job results back via sap_rfc_call. Most basis teams treat AAP as the orchestrator and SolMan as the change repository; SolMan owns the change/release record, AAP owns the execution.

7. What’s the right failure mode for a transport import that gets RC=8? Stop the workflow, page the basis lead. RC=8 means object-related issues that need human inspection. Do NOT auto-retry; you may mask a real syntax error.

8. How big should HANA backups be in production? Daily full + every-15min log backup is the SAP-recommended baseline. Use HANA’s native backup with a backint-compatible target (Veeam, Commvault, or HANA on AWS Backup with the AWS backint adapter). Ansible owns the backup-job creation and rotation policy; the actual data movement is the backint plug-in’s job.

9. Can I use Ansible to drive SAP MaxDB or ASE? Yes, with community.sap_install having modules for both. They are less commonly needed than HANA but follow the same install/patch/configure pattern.

10. What’s the single most underrated SAP automation practice? The per-SAP-Note role. When SAP issues a new Note that affects your platform (e.g., a kernel-tuning Note for memory leaks), encode it as a small Ansible role with a clear when guard, run it on the test landscape, and add it to the OS-prep workflow. Six months later, when an auditor asks “is Note 2382421 applied?”, the answer is “yes, and here is the ledger entry showing every host has run that role.”

Hands-on lab — first SAP-aware Ansible play

A full HANA install needs SAP licensed media. The following lab uses publicly available SAP-related tooling to get hands-on without a license: setting up the OS prerequisites a HANA install would expect.

Prerequisites: RHEL 8/9 VM with at least 4GB RAM, ansible-core ≥ 2.16.

mkdir -p sap-lab/{playbooks,roles}
cd sap-lab
ansible-galaxy collection install redhat.sap_install

# playbooks/os-prep.yml
- hosts: localhost
  become: true
  collections:
    - redhat.sap_install
  roles:
    - role: sap_general_preconfigure
      vars:
        sap_general_preconfigure_modify_etc_hosts: false
    - role: sap_hana_preconfigure
      vars:
        sap_hana_preconfigure_thp: never

ansible-playbook playbooks/os-prep.yml -K
sysctl kernel.shmmax     # huge value
cat /sys/kernel/mm/transparent_hugepage/enabled   # [never]
tuned-adm active         # sap-hana
ulimit -n -H -S

Now read what the role did:

cat /etc/security/limits.d/99-sap.conf
cat /etc/sysctl.d/sap.conf
ansible-galaxy role list redhat.sap_install

You have just executed a non-trivial fraction of the OS work that goes into every HANA install, and seen the artefacts the SAP basis role leaves behind. Extend the lab by writing a roles/sap_storage_layout that creates the /hana/{data,log,shared} mountpoints (without real disks, use tmpfs); rerun and confirm idempotency.

Glossary

SID — System ID, three-letter code identifying an SAP system (e.g., S4P).
HANA — SAP’s in-memory column store database.
HSR — HANA System Replication.
NetWeaver — SAP application server stack (ABAP and Java).
ASCS — ABAP SAP Central Services (message + enqueue).
ERS — Enqueue Replication Server (mirror of enqueue table).
PAS — Primary Application Server (dialog instance).
AAS — Additional Application Server (dialog instance, scale-out).
SAP Note — vendor-issued documentation/configuration update.
SPS — Support Package Stack (HANA quarterly maintenance bundle).
Pacemaker — Linux HA cluster manager.
STONITH — Shoot The Other Node In The Head; fencing.
<sid>adm — OS user that owns SAP processes.
tp — Transport tool used to import changes.
Solman — SAP Solution Manager, change/release management product.

Certification mapping

EX374 — Workflow orchestration, RBAC, surveys (used heavily for transport imports).
C_HANATEC, C_S4TM — SAP technology consultant certifications; this lesson maps to the basis tasks in those exams.
AWS Certified: SAP on AWS Specialty — direct alignment.

Next steps

You now have an opinionated, Ansible-driven view of the SAP basis function. The remaining specialist lessons cover:

Edge and IoT fleet management — small, intermittent, distributed targets that need a different connection and policy model than a datacentre fleet.
ITSM and ChatOps — wiring AAP into ServiceNow/Jira/Slack so SAP transport imports and other regulated changes flow through controlled approval gates.
Database migrations (online cutovers, blue-green) — the database-specific patterns that complement OS migrations.
Observability — capturing RPO/RTO, patch coverage, HSR lag, and the rest of your SAP-relevant metrics into Prometheus/Grafana for a single pane of glass.

If you only take one habit from this lesson: always go through the SAP-supported collections. They are not perfect, but they encode hundreds of person-years of basis knowledge, and the alternative is a fork of hdblcm invocations that you will maintain forever.

Ansible for SAP, In Depth: HANA System Replication, NetWeaver, Kernel Patching & Landscape Automation