Modern Linux Networking: Bonding, VLANs, and Firewalls with nftables and firewalld

Linux networking on a modern distro is a layered stack: a configuration manager that renders state, a kernel that does the actual bonding and VLAN tagging, and nftables underneath whatever firewall front end you run. This article builds a production host from the wire up: two NICs bonded for redundancy, tagged VLAN sub-interfaces, a bridge for VMs, source-based policy routing across two gateways, and a firewall layer done first in firewalld and then in raw nftables.

Scope: examples target current RHEL-family (RHEL 9/10, Rocky, Alma) using NetworkManager and firewalld, and current Ubuntu Server (22.04/24.04) using Netplan. Both kernels use the nftables backend. Where syntax diverges I show both. Host has two NICs, enp1s0 and enp2s0, bonded into bond0, carrying VLAN 10 (servers) and VLAN 20 (storage).

1. Picking your stack: NetworkManager vs. systemd-networkd vs. Netplan

These are not interchangeable, and mixing them is the single most common cause of “my config disappeared after reboot.” Pick one renderer per host and disable the others.

Tool	Renders to	Best fit	Notes
NetworkManager	kernel via NM daemon	RHEL family default, desktops, laptops, anything with roaming or VPN	Keyfiles in `/etc/NetworkManager/system-connections/`; `nmcli` is the API
systemd-networkd	kernel via networkd	Minimal/immutable servers, containers, no NM	Declarative `.network`/`.netdev` units in `/etc/systemd/network/`
Netplan	NetworkManager or systemd-networkd	Ubuntu default	YAML in `/etc/netplan/` that generates config for a chosen `renderer`

The mental model that matters: Netplan is a front end, not a backend. On Ubuntu Server it renders to systemd-networkd; on Desktop, to NetworkManager. Those two daemons are the real backends. Decide which one owns your interfaces and commit:

# RHEL family: NetworkManager owns everything
systemctl enable --now NetworkManager
systemctl disable --now systemd-networkd 2>/dev/null || true

# Minimal Ubuntu/Debian server choosing networkd directly
systemctl enable --now systemd-networkd systemd-resolved

Do not edit rendered files by hand. Netplan regenerates /run/systemd/network/ on netplan apply and will clobber your changes. NetworkManager keyfiles are the source of truth on RHEL; ifcfg-* scripts are deprecated and removed in RHEL 10.

2. Interface bonding and choosing the right mode

Bonding aggregates NICs for redundancy or throughput. The mode choice is a function of what your switch supports, not preference.

Mode	Name	Switch requirement	Use when
`active-backup` (1)	failover	none	Two access ports, possibly different switches; the safe default
`802.3ad` (4)	LACP	switch LAG/port-channel configured	One switch (or MLAG pair) with matching LACP config
`balance-xor` (2)	static hash	static LAG on switch	LACP unavailable but switch supports static aggregation

If you are unsure what the switch is doing, use active-backup. It needs nothing from the network and survives a single link or switch failure. Use 802.3ad only when the switch ports are explicitly in a LACP port-channel; a mismatch here causes a flapping, half-working link that is miserable to debug.

NetworkManager (nmcli)

# Create the bond. miimon=100 polls link state every 100ms.
nmcli connection add type bond con-name bond0 ifname bond0 \
  bond.options "mode=active-backup,miimon=100,primary=enp1s0"

# Enslave both NICs
nmcli connection add type ethernet con-name bond0-p1 ifname enp1s0 master bond0
nmcli connection add type ethernet con-name bond0-p2 ifname enp2s0 master bond0

nmcli connection up bond0

For LACP, swap the options string to mode=802.3ad,miimon=100,lacp_rate=fast,xmit_hash_policy=layer3+4. The xmit_hash_policy controls how flows are distributed across members; layer3+4 hashes on IP and port, giving better spread for many connections between the same hosts.

Netplan (Ubuntu)

# /etc/netplan/01-bond.yaml
network:
  version: 2
  renderer: networkd
  ethernets:
    enp1s0: {}
    enp2s0: {}
  bonds:
    bond0:
      interfaces: [enp1s0, enp2s0]
      parameters:
        mode: active-backup
        mii-monitor-interval: 100
        primary: enp1s0

Apply and confirm the bond formed before layering anything on top:

chmod 600 /etc/netplan/*.yaml   # Netplan warns on world-readable files
netplan apply
cat /proc/net/bonding/bond0     # shows mode, active slave, per-link status

3. Tagged VLAN sub-interfaces and bridges for virtualization hosts

The switch port carrying bond0 is a trunk. We split traffic into VLAN sub-interfaces, then put a bridge on the VLAN that VMs attach to. Order matters: VLAN rides on the bond, bridge rides on the VLAN.

NetworkManager

# VLAN 10 on top of the bond -> server network, gets an IP directly
nmcli connection add type vlan con-name vlan10 ifname bond0.10 \
  dev bond0 id 10 \
  ipv4.method manual ipv4.addresses 10.10.10.5/24 ipv4.gateway 10.10.10.1

# VLAN 20 -> a bridge for VMs (no IP on the VLAN itself)
nmcli connection add type vlan con-name vlan20 ifname bond0.20 dev bond0 id 20
nmcli connection add type bridge con-name br-vm ifname br-vm \
  bridge.stp no ipv4.method disabled ipv6.method disabled
nmcli connection modify vlan20 master br-vm
nmcli connection up vlan20 && nmcli connection up br-vm

A few decisions worth stating: the bridge gets no IP because the host does not live on the VM network, and STP is off because a single uplink bridge has no loop to prevent. Leave STP on only if multiple bridges could form a loop.

Netplan

# /etc/netplan/02-vlans.yaml
network:
  version: 2
  renderer: networkd
  vlans:
    bond0.10:
      id: 10
      link: bond0
      addresses: [10.10.10.5/24]
      routes:
        - to: default
          via: 10.10.10.1
    bond0.20:
      id: 20
      link: bond0
  bridges:
    br-vm:
      interfaces: [bond0.20]
      parameters:
        stp: false

Bridging only works if the kernel actually forwards frames between bridge ports. Libvirt/KVM handle this, but if you build the bridge manually for containers, confirm bridge-nf-call-iptables behavior. On a bridge that should not be filtered by the host firewall, set net.bridge.bridge-nf-call-iptables=0 and net.bridge.bridge-nf-call-ip6tables=0 via sysctl, or VM traffic gets silently dropped by host rules.

4. Static addressing, multiple gateways, and source-based policy routing

The hard part is two gateways. The main routing table picks a route by destination only, so a host with two uplinks replies out of the wrong interface, and stateful firewalls (or RPF checks) drop the asymmetric flow. The fix is policy routing: a second routing table plus rules that select it by source address.

Say VLAN 10 (10.10.10.5, gw 10.10.10.1) is primary and VLAN 20 (10.10.20.5, gw 10.10.20.1) is a storage path that must reply via its own gateway.

First, name a routing table:

echo "100 storage" >> /etc/iproute2/rt_tables

Then populate it and add the rule. With NetworkManager you can persist all of this on the connection itself:

# Default route for the storage table goes out VLAN 20's gateway
nmcli connection modify vlan20 \
  +ipv4.routes "0.0.0.0/0 10.10.20.1 table=100"

# Rule: traffic SOURCED from 10.10.20.5 uses table 100
nmcli connection modify vlan20 \
  +ipv4.routing-rules "priority 100 from 10.10.20.5 table 100"

nmcli connection up vlan20

The equivalent raw ip commands (useful for testing before persisting) make the mechanism explicit:

ip route add default via 10.10.20.1 dev bond0.20 table storage
ip rule add from 10.10.20.5 lookup storage priority 100
ip rule show          # confirm the rule sits above the main-table lookup
ip route show table storage

This is the canonical cure for “asymmetric routing” and reverse-path-filter drops. Verify your RPF mode with sysctl net.ipv4.conf.all.rp_filter. Mode 1 (strict) will drop the asymmetric replies that policy routing is meant to fix; if you cannot use policy routing everywhere, mode 2 (loose) is the pragmatic compromise.

In Netplan, the same intent is expressed with routing-policy:

    bond0.20:
      id: 20
      link: bond0
      addresses: [10.10.20.5/24]
      routes:
        - to: default
          via: 10.10.20.1
          table: 100
      routing-policy:
        - from: 10.10.20.5
          table: 100
          priority: 100

5. firewalld zones, services, and rich rules for everyday hosts

For a normal server, drop to raw nftables only when you have a reason. firewalld gives you zones, named services, and runtime/permanent separation, and on modern distros it writes nftables underneath anyway.

The model: every interface or source belongs to a zone; each zone has a default target (which drops unsolicited input) and an allow-list of services/ports. Assign interfaces deliberately.

# Put the server VLAN in the internal zone, storage in a trusted-but-scoped zone
nmcli connection modify vlan10 connection.zone internal
nmcli connection modify vlan20 connection.zone work
nmcli connection up vlan10 && nmcli connection up vlan20

# Open services on the internal zone (permanent), then reload
firewall-cmd --permanent --zone=internal --add-service=ssh
firewall-cmd --permanent --zone=internal --add-service=https
firewall-cmd --permanent --zone=internal --add-port=9100/tcp   # node_exporter
firewall-cmd --reload

Always test with a runtime change (no --permanent), confirm you did not lock yourself out, then commit. A clean pattern: apply runtime, verify SSH still works from a second session, then firewall-cmd --runtime-to-permanent. The --permanent-then---reload flow above is fine for non-disruptive rules.

Rich rules cover what plain services cannot, for example restricting a port to a source subnet and logging it:

firewall-cmd --permanent --zone=internal --add-rich-rule='
  rule family="ipv4" source address="10.10.10.0/24"
  port port="9100" protocol="tcp" accept'

# Rate-limited SSH with logging
firewall-cmd --permanent --zone=internal --add-rich-rule='
  rule service name="ssh" log prefix="SSH " level="info" limit value="5/m" accept'

firewall-cmd --reload
firewall-cmd --zone=internal --list-all   # human-readable effective config

6. Dropping to raw nftables: tables, chains, sets, and NAT

When you need full control, an edge box, or rules firewalld cannot express cleanly, write nftables directly. Disable firewalld first so the two do not fight over the ruleset:

systemctl disable --now firewalld
systemctl enable --now nftables

nftables has one unified syntax for IPv4/IPv6, atomic rule loading, and first-class sets (named address/port groups) and maps. A complete stateful firewall with NAT in /etc/nftables.conf:

#!/usr/sbin/nft -f
flush ruleset

table inet filter {
    set admin_nets {
        type ipv4_addr
        flags interval
        elements = { 10.10.10.0/24, 192.0.2.10 }
    }

    chain input {
        type filter hook input priority filter; policy drop;

        ct state established,related accept
        ct state invalid drop
        iif "lo" accept
        ip protocol icmp accept
        ip6 nexthdr ipv6-icmp accept

        # SSH only from admin networks, rate-limited
        tcp dport 22 ip saddr @admin_nets ct state new \
            limit rate 10/minute accept
        tcp dport { 80, 443 } accept
    }

    chain forward {
        type filter hook forward priority filter; policy drop;
        ct state established,related accept
        iifname "br-vm" accept            # let VMs out
    }

    chain output {
        type filter hook output priority filter; policy accept;
    }
}

# Masquerade VM traffic leaving the uplink
table inet nat {
    chain postrouting {
        type nat hook postrouting priority srcnat; policy accept;
        oifname "bond0.10" ip saddr 10.10.20.0/24 masquerade
    }
}

Two design points that trip people up. The inet family handles IPv4 and IPv6 in one table, so you do not maintain parallel ip and ip6 rulesets. And the base chain priorities and hooks are not decorative — priority filter (0) and priority srcnat (100) place your chains correctly relative to the kernel’s other netfilter hooks. NAT works only in a chain hooked at prerouting/postrouting with the right priority; a NAT rule in a filter-hooked chain does nothing.

Load and verify atomically:

nft -c -f /etc/nftables.conf   # -c = check syntax, change nothing
nft -f /etc/nftables.conf      # atomic apply: all-or-nothing
nft list ruleset               # show the live, effective ruleset

NAT also requires forwarding turned on, which is separate from any rule:

sysctl -w net.ipv4.ip_forward=1
sysctl -w net.ipv6.conf.all.forwarding=1
echo "net.ipv4.ip_forward = 1" > /etc/sysctl.d/99-forward.conf

7. Persisting and templating rules, plus connection-tracking pitfalls

Persistence differs by front end. With firewalld, --permanent writes XML to /etc/firewalld/; runtime changes live in kernel memory only and vanish on reload. With raw nftables, the nftables.service loads /etc/nftables.conf at boot — so your file is the persistence mechanism. Confirm the unit is enabled, or your carefully built ruleset evaporates on reboot.

For fleets, template the ruleset. Sets and includes keep it maintainable:

# /etc/nftables.conf
include "/etc/nftables.d/*.nft"

Render /etc/nftables.d/admin_nets.nft per host or per environment from Ansible/Jinja, keeping the chain logic identical everywhere and varying only the sets. This is the same separation firewalld gives with zones, but versioned in Git.

Connection tracking is where correctness lives, and where the subtle bugs hide:

Order the conntrack rule first. ct state established,related accept must precede your new-state rules. Putting it last means every packet of an established flow walks the whole chain; worse, an early drop can break return traffic.
ct state invalid drop early, but understand it. It catches out-of-window and malformed packets. On asymmetric paths (see Step 4) legitimate packets can be marked invalid if the firewall never saw the SYN — another reason to fix routing first.
conntrack has a table limit. A busy box can exhaust nf_conntrack_max; you will see nf_conntrack: table full, dropping packet in the kernel log. Size it for the workload via sysctl net.netfilter.nf_conntrack_max.
Bridge filtering surprises. As in Step 3, br_netfilter can route bridged frames through the host firewall. Either account for it in forward rules or disable bridge-nf-call-*.

Verify

Prove each layer independently, bottom up. A failure at layer N invalidates everything above it.

# L1 bond: mode correct, both members up, one active
cat /proc/net/bonding/bond0

# L2 VLANs and bridge present and up
ip -d link show bond0.10        # -d shows vlan id, protocol
ip link show master br-vm       # ports enslaved to the bridge

# L3 addressing and the policy-routing rule
ip addr show
ip rule show                    # storage rule above main lookup
ip route get 8.8.8.8 from 10.10.20.5   # which gateway/iface for storage source?

# Live sockets and listeners
ss -tlnp                        # listening TCP sockets + owning process

# Packet-level proof on the wire
tcpdump -ni bond0.20 host 10.10.20.5

# Firewall: effective ruleset, then watch it counting
nft list ruleset                # raw nftables
firewall-cmd --list-all-zones   # firewalld
nft monitor trace               # live trace of packets through chains (with a trace rule)

ip route get <dst> from <src> is the fastest sanity check for policy routing: it tells you exactly which table, gateway, and interface the kernel will use for a given source. If it picks the wrong gateway, your ip rule priority or table is wrong, full stop.

Checklist

Enterprise scenario

A payments team ran KVM hypervisors with bond0 in 802.3ad (LACP) against an MLAG switch pair, VLAN 10 for the host and VLAN 20 for an NFS storage backend on a separate gateway. After a hypervisor reboot, NFS mounts hung for ~30s before recovering, and dmesg was full of nf_conntrack: table full, dropping packet. The trigger: all VMs re-establishing connections at once blew past the default nf_conntrack_max, and storage replies sourced from 10.10.20.5 were being marked invalid and dropped because the default route sent them out VLAN 10 — classic asymmetric routing the firewall couldn’t track.

The fix was two-part. First, pin storage replies to their own gateway with policy routing so conntrack always saw a symmetric flow:

nmcli connection modify vlan20 \
  +ipv4.routes "0.0.0.0/0 10.10.20.1 table=100" \
  +ipv4.routing-rules "priority 100 from 10.10.20.5 table 100"
nmcli connection up vlan20

Second, size conntrack for the real boot-storm peak and persist it:

echo "net.netfilter.nf_conntrack_max = 524288" > /etc/sysctl.d/99-conntrack.conf
echo "net.netfilter.nf_conntrack_buckets = 131072" >> /etc/sysctl.d/99-conntrack.conf
sysctl --system

The subtle part: the table was never actually “too small” for steady state — it overflowed only during the synchronized reconnect. Watching conntrack -C against nf_conntrack_max during a controlled drain-and-reboot, not at idle, was what made the headroom obvious. Both changes went into Ansible so every hypervisor in the fleet got identical routing rules and conntrack sizing.

Pitfalls and next steps

The recurring failure modes are predictable: two config managers fighting over the same interface, a NAT rule placed in a filter-hooked chain that quietly does nothing, an LACP bond half-negotiated against a switch never configured for it, and asymmetric routing that a strict rp_filter turns into silent drops. Each is invisible until you check the layer directly with ip, nft list ruleset, and tcpdump — which is why the Verify section moves bottom-up.

From here, fold the bond/VLAN config into provisioning (Netplan files and NM keyfiles render cleanly from Jinja), move the nftables ruleset into a versioned nftables.d/ with per-environment sets, and wire nft counters or node_exporter into your metrics so a flapping link or a filling conntrack table shows up on a dashboard before it pages you.