nftables

Published: 02-05-2017

Updated: 14-04-2018

By: Maxime de Roucy

tags: firewall network nftables

I recently switch from iptables to nftables (I have a very simple/personal firewall).

General notes

priority

It’s not mandatory to use the priorities listed in the wiki. You can chose whichever integer recognised by the system.

The lower the number the sooner the chain will be apply in the packet flow (inside the hook).

route chaine type

The “route” chain type shouldn’t (can’t ?) be used in another hook than “output”. When a packet is modified (eg. by a payload statement) by this chain it will go throuht a re-routing process ; between “output” and “postrouting” hooks.

route, which is used to reroute packets if any relevant IP header field or the packet mark is modified. If you are familiar with iptables, this chain type provides equivalent semantics to the mangle table but only for the output hook (for other hooks use type filter instead). This is supported by the ip and ip6 table families.

“route” chain is the equivalent of the iptables’s “mangle” table.

The iptables mangle table only triggers the reroute semantics in the output chain, ie. in other chains, mangle chains are behaving just like filter chains. (source)

The networking stack does the rerouting in the route table. There is a check which prevents calling ip_reroute_me_harder when it was not modified in route output chain. (source)

If the packet is modified by a nat statements (dnat, sdnat, masquerade) the re-routing process is also triggered.

The re-routing process isn’t triggered by a payload statement which would be contained in a nat or a filter chain.

errors

Servname not found in nft services list

 /etc/nftables.conf:37:27-31: Error: Could not resolve service: Servname not found in nft services list
                 udp dport llmnr accept comment "systemd-resolved: llmnr (udp)"

nftables n’utilise pas le fichier /etc/services pour déterminer les numéros de port des services. Il dispose de sa propre liste compilée en dure.

Packet flow

Back when I built my iptables firewall I refereed to the packet flow diagram, by Jan Engelhardt, on iptables Wikipedia web page. Using this diagram for nftables firewall is hard as some concept changed.

I did some tests and draw my own diagram (using yed editor) covering all netdev, ip, ip6, inet, bridge and arp tables.

Packet flow in nftables

Tests

To build this diagram I made several tests which I explain here.

Network configuration 1

1 host, 1 VM including 1 bridge, 2 veth with 1 in a netns

The VM’s hostname is “arch-64”, hosts’s is “laptop” and the netns’s name is “test”.

VM’s network configuration:

[root@arch-64 ~]# ip l set promisc on dev ens3
[root@arch-64 ~]# ip l add name br0 type bridge
[root@arch-64 ~]# ip l set br0 up
[root@arch-64 ~]# ip l set master br0 dev ens3
[root@arch-64 ~]# ip a add 192.168.122.2/24 dev br0
[root@arch-64 ~]# ip l add veth0 type veth peer name veth1
[root@arch-64 ~]# ip l set up dev veth0
[root@arch-64 ~]# ip netns add test
[root@arch-64 ~]# ip l set netns test dev veth1
[root@arch-64 ~]# ip netns exec test ip l set up dev veth1
[root@arch-64 ~]# ip netns exec test ip a add 192.168.200.2/24 dev veth1
[root@arch-64 ~]# sysctl -w net.ipv4.ip_forward=1
[root@arch-64 ~]# ip netns exec test ip r a 192.168.122.0/24 via 192.168.200.1
[root@arch-64 ~]# ip a
…
2: ens3: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br0 state UP group default qlen 1000
    link/ether 52:54:00:ea:6a:8e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:feea:6a8e/64 scope link
       valid_lft forever preferred_lft forever
…
4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:ea:6a:8e brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.2/24 scope global br0
       valid_lft forever preferred_lft forever
    inet6 fe80::58d9:72ff:fe6b:4ef2/64 scope link
       valid_lft forever preferred_lft forever
…
6: veth0@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether f6:45:3a:c6:ce:5e brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.200.1/24 scope global veth0
       valid_lft forever preferred_lft forever
    inet6 fe80::f445:3aff:fec6:ce5e/64 scope link
       valid_lft forever preferred_lft forever
[root@arch-64 ~]# ip netns exec test ip a
…
5: veth1@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 02:05:b7:2d:08:d2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.200.2/24 scope global veth1
       valid_lft forever preferred_lft forever
    inet6 fe80::5:b7ff:fe2d:8d2/64 scope link
       valid_lft forever preferred_lft forever

The laptop network was configured by libvirt, I only had to add a route to veth1.

max@laptop % sudo ip route add 192.168.200.0/24 via 192.168.122.2

Here the nftables configuration I used in the VM in order to get the maximum amount of information on an ICMP packet flow. All nftables possible chains are here :

flush ruleset

table ip iptest {
        chain prerouting {
                type filter hook prerouting priority 0; policy accept;
                ip protocol icmp log prefix "PREROUTING  FILTER     IP: "
        }
        chain input {
                type filter hook input priority 0; policy accept;
                ip protocol icmp log prefix "INPUT       FILTER     IP: "
        }
        chain forward {
                type filter hook forward priority 0; policy accept;
                ip protocol icmp log prefix "FORWARD     FILTER     IP: "
        }
        chain output {
                type filter hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT      FILTER     IP: "
        }
        chain postrouting {
                type filter hook postrouting priority 0; policy accept;
                ip protocol icmp log prefix "POSTROUTING FILTER     IP: "
        }
        # NAT
        chain preroutingnat {
                type nat hook prerouting priority 0; policy accept;
                ip protocol icmp log prefix "PREROUTING     NAT     IP: "
        }
        chain inputnat {
                type nat hook input priority 0; policy accept;
                ip protocol icmp log prefix "INPUT          NAT     IP: "
        }
        chain outputnat {
                type nat hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT         NAT     IP: "
        }
        chain postroutingnat {
                type nat hook postrouting priority 0; policy accept;
                ip protocol icmp log prefix "POSTROUTING    NAT     IP: "
        }
        # ROUTE
        chain routeoutput {
                type route hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT       ROUTE     IP: "
        }
}

table inet inettest {
        chain prerouting {
                type filter hook prerouting priority 0; policy accept;
                ip protocol icmp log prefix "PREROUTING  FILTER   INET: "
        }
        chain input {
                type filter hook input priority 0; policy accept;
                ip protocol icmp log prefix "INPUT       FILTER   INET: "
        }
        chain forward {
                type filter hook forward priority 0; policy accept;
                ip protocol icmp log prefix "FORWARD     FILTER   INET: "
        }
        chain output {
                type filter hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT      FILTER   INET: "
        }
        chain postrouting {
                type filter hook postrouting priority 0; policy accept;
                ip protocol icmp log prefix "POSTROUTING FILTER   INET: "
        }
}

table arp arptest {
        chain input {
                type filter hook input priority 0; policy accept;
                log prefix "INPUT       FILTER    ARP: "
        }
        chain output {
                type filter hook output priority 0; policy accept;
                log prefix "OUTPUT      FILTER    ARP: "
        }
}

table bridge bridgetest {
        chain prerouting {
                type filter hook prerouting priority 0; policy accept;
                ip protocol icmp log prefix "PREROUTING  FILTER BRIDGE: "
        }
        chain input {
                type filter hook input priority 0; policy accept;
                ip protocol icmp log prefix "INPUT       FILTER BRIDGE: "
        }
        chain forward {
                type filter hook forward priority 0; policy accept;
                ip protocol icmp log prefix "FORWARD     FILTER BRIDGE: "
        }
        chain output {
                type filter hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT      FILTER BRIDGE: "
        }
        chain postrouting {
                type filter hook postrouting priority 0; policy accept;
                ip protocol icmp log prefix "POSTROUTING FILTER BRIDGE: "
        }
}

table netdev netdevtest {
        chain filter {
                # When adding a chain on ingress hook, it is mandatory to specify the device where the chain will be attached
                type filter hook ingress device ens3 priority 0; policy accept;
                ip protocol icmp log prefix "INGRESS     FILTER NETDEV: "
        }
}

Just to clear things out, here are the chains I try but was rejected by nftables :

ip, ip6, inet and bridge families

host(virbr0) → vm(br0)

From laptop :

max@laptop % ping -c 1 192.168.122.2

Inside the VM we get :

# echo-request
INGRESS     FILTER NETDEV: IN=ens3 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=20823 DF PROTO=ICMP TYPE=8 CODE=0 ID=7190 SEQ=1
PREROUTING  FILTER BRIDGE: IN=ens3 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=20823 DF PROTO=ICMP TYPE=8 CODE=0 ID=7190 SEQ=1
INPUT       FILTER BRIDGE: IN=ens3 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=20823 DF PROTO=ICMP TYPE=8 CODE=0 ID=7190 SEQ=1
PREROUTING  FILTER     IP: IN=br0 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=20823 DF PROTO=ICMP TYPE=8 CODE=0 ID=7190 SEQ=1
PREROUTING     NAT     IP: IN=br0 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=20823 DF PROTO=ICMP TYPE=8 CODE=0 ID=7190 SEQ=1
PREROUTING  FILTER   INET: IN=br0 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=20823 DF PROTO=ICMP TYPE=8 CODE=0 ID=7190 SEQ=1
INPUT       FILTER     IP: IN=br0 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=20823 DF PROTO=ICMP TYPE=8 CODE=0 ID=7190 SEQ=1
INPUT          NAT     IP: IN=br0 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=20823 DF PROTO=ICMP TYPE=8 CODE=0 ID=7190 SEQ=1
INPUT       FILTER   INET: IN=br0 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=20823 DF PROTO=ICMP TYPE=8 CODE=0 ID=7190 SEQ=1
# echo-reply
OUTPUT      FILTER     IP: IN= OUT=br0 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49523 PROTO=ICMP TYPE=0 CODE=0 ID=7190 SEQ=1
OUTPUT       ROUTE     IP: IN= OUT=br0 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49523 PROTO=ICMP TYPE=0 CODE=0 ID=7190 SEQ=1
OUTPUT      FILTER   INET: IN= OUT=br0 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49523 PROTO=ICMP TYPE=0 CODE=0 ID=7190 SEQ=1
POSTROUTING FILTER     IP: IN= OUT=br0 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49523 PROTO=ICMP TYPE=0 CODE=0 ID=7190 SEQ=1
POSTROUTING FILTER   INET: IN= OUT=br0 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49523 PROTO=ICMP TYPE=0 CODE=0 ID=7190 SEQ=1
OUTPUT      FILTER BRIDGE: IN= OUT=ens3 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49523 PROTO=ICMP TYPE=0 CODE=0 ID=7190 SEQ=1
POSTROUTING FILTER BRIDGE: IN= OUT=ens3 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49523 PROTO=ICMP TYPE=0 CODE=0 ID=7190 SEQ=1

During this test, ARP tables where already filled. No ARP packet where exchanged.

“PREROUTING FILTER IP”, “PREROUTING NAT IP” and “PREROUTING FILTER INET” are switchable. They appear in the order they are declared in the configuration (because they have the same priority). We can reorder them by changing their declaration order or their priorities.

All these chains are switchable:

“PREROUTING NAT” and “INPUT NAT” appear because the packet (ICMP request) is the first of the ICMP connection/session. No other “nat” chain appear because the ICMP reply is not the first packet of the ICMP connection/session.

“nat” chains only examine and modify the first packet of a connection, it also setup the NAT binding for this connection. The other packets (of this connection) are mangled by the NAT engine using the NAT binding already setup by the first packet, but “nat” chains rules aren’t examined.

All future packets in this connection will also be mangled, and rules should cease being examined. (man nftables)

vm(br0) → host(virbr0)

From the VM:

[root@arch-64 ~]# ping -c 1 192.168.122.1

In the VM:

# echo-request
OUTPUT      FILTER     IP: IN= OUT=br0 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=9138 DF PROTO=ICMP TYPE=8 CODE=0 ID=29023 SEQ=1
OUTPUT         NAT     IP: IN= OUT=br0 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=9138 DF PROTO=ICMP TYPE=8 CODE=0 ID=29023 SEQ=1
OUTPUT       ROUTE     IP: IN= OUT=br0 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=9138 DF PROTO=ICMP TYPE=8 CODE=0 ID=29023 SEQ=1
OUTPUT      FILTER   INET: IN= OUT=br0 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=9138 DF PROTO=ICMP TYPE=8 CODE=0 ID=29023 SEQ=1
POSTROUTING FILTER     IP: IN= OUT=br0 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=9138 DF PROTO=ICMP TYPE=8 CODE=0 ID=29023 SEQ=1
POSTROUTING    NAT     IP: IN= OUT=br0 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=9138 DF PROTO=ICMP TYPE=8 CODE=0 ID=29023 SEQ=1
POSTROUTING FILTER   INET: IN= OUT=br0 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=9138 DF PROTO=ICMP TYPE=8 CODE=0 ID=29023 SEQ=1
OUTPUT      FILTER BRIDGE: IN= OUT=ens3 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=9138 DF PROTO=ICMP TYPE=8 CODE=0 ID=29023 SEQ=1
POSTROUTING FILTER BRIDGE: IN= OUT=ens3 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=9138 DF PROTO=ICMP TYPE=8 CODE=0 ID=29023 SEQ=1
# echo-reply
INGRESS     FILTER NETDEV: IN=ens3 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=11865 PROTO=ICMP TYPE=0 CODE=0 ID=29023 SEQ=1
PREROUTING  FILTER BRIDGE: IN=ens3 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=11865 PROTO=ICMP TYPE=0 CODE=0 ID=29023 SEQ=1
INPUT       FILTER BRIDGE: IN=ens3 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=11865 PROTO=ICMP TYPE=0 CODE=0 ID=29023 SEQ=1
PREROUTING  FILTER     IP: IN=br0 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=11865 PROTO=ICMP TYPE=0 CODE=0 ID=29023 SEQ=1
PREROUTING  FILTER   INET: IN=br0 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=11865 PROTO=ICMP TYPE=0 CODE=0 ID=29023 SEQ=1
INPUT       FILTER     IP: IN=br0 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=11865 PROTO=ICMP TYPE=0 CODE=0 ID=29023 SEQ=1
INPUT       FILTER   INET: IN=br0 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=11865 PROTO=ICMP TYPE=0 CODE=0 ID=29023 SEQ=1

Same remarks as in the host(virbr0) → vm(br0) section.

Switchable chains :

host(virbr0) → vm(netns(veth1))

From the host (laptop):

max@laptop % ping -c 1 192.168.200.2

In the VM:

# echo-request
INGRESS     FILTER NETDEV: IN=ens3 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=25405 DF PROTO=ICMP TYPE=8 CODE=0 ID=7406 SEQ=1
PREROUTING  FILTER BRIDGE: IN=ens3 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=25405 DF PROTO=ICMP TYPE=8 CODE=0 ID=7406 SEQ=1
INPUT       FILTER BRIDGE: IN=ens3 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=25405 DF PROTO=ICMP TYPE=8 CODE=0 ID=7406 SEQ=1
PREROUTING  FILTER     IP: IN=br0 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=25405 DF PROTO=ICMP TYPE=8 CODE=0 ID=7406 SEQ=1
PREROUTING     NAT     IP: IN=br0 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=25405 DF PROTO=ICMP TYPE=8 CODE=0 ID=7406 SEQ=1
PREROUTING  FILTER   INET: IN=br0 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=25405 DF PROTO=ICMP TYPE=8 CODE=0 ID=7406 SEQ=1
FORWARD     FILTER     IP: IN=br0 OUT=veth0 MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=25405 DF PROTO=ICMP TYPE=8 CODE=0 ID=7406 SEQ=1
FORWARD     FILTER   INET: IN=br0 OUT=veth0 MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=25405 DF PROTO=ICMP TYPE=8 CODE=0 ID=7406 SEQ=1
POSTROUTING FILTER     IP: IN= OUT=veth0 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=25405 DF PROTO=ICMP TYPE=8 CODE=0 ID=7406 SEQ=1
POSTROUTING    NAT     IP: IN= OUT=veth0 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=25405 DF PROTO=ICMP TYPE=8 CODE=0 ID=7406 SEQ=1
POSTROUTING FILTER   INET: IN= OUT=veth0 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=25405 DF PROTO=ICMP TYPE=8 CODE=0 ID=7406 SEQ=1
# echo-reply
PREROUTING  FILTER     IP: IN=veth0 OUT= MAC=f6:45:3a:c6:ce:5e:02:05:b7:2d:08:d2:08:00 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=33660 PROTO=ICMP TYPE=0 CODE=0 ID=7406 SEQ=1
PREROUTING  FILTER   INET: IN=veth0 OUT= MAC=f6:45:3a:c6:ce:5e:02:05:b7:2d:08:d2:08:00 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=33660 PROTO=ICMP TYPE=0 CODE=0 ID=7406 SEQ=1
FORWARD     FILTER     IP: IN=veth0 OUT=br0 MAC=f6:45:3a:c6:ce:5e:02:05:b7:2d:08:d2:08:00 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=33660 PROTO=ICMP TYPE=0 CODE=0 ID=7406 SEQ=1
FORWARD     FILTER   INET: IN=veth0 OUT=br0 MAC=f6:45:3a:c6:ce:5e:02:05:b7:2d:08:d2:08:00 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=33660 PROTO=ICMP TYPE=0 CODE=0 ID=7406 SEQ=1
POSTROUTING FILTER     IP: IN= OUT=br0 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=33660 PROTO=ICMP TYPE=0 CODE=0 ID=7406 SEQ=1
POSTROUTING FILTER   INET: IN= OUT=br0 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=33660 PROTO=ICMP TYPE=0 CODE=0 ID=7406 SEQ=1
OUTPUT      FILTER BRIDGE: IN= OUT=ens3 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=33660 PROTO=ICMP TYPE=0 CODE=0 ID=7406 SEQ=1
POSTROUTING FILTER BRIDGE: IN= OUT=ens3 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=33660 PROTO=ICMP TYPE=0 CODE=0 ID=7406 SEQ=1

Same remarks as in the host(virbr0) → vm(br0) section.

Switchable chains:

It’s normal not to have “NETDEV” logs for the echo-reply. The netdev chain is only bound to the ens3 interface.

vm(netns(veth1)) → host(virbr0)

From the VM:

[root@arch-64 ~]# ip netns exec test ping -c 1 192.168.122.1

In the VM:

# echo-request
PREROUTING  FILTER     IP: IN=veth0 OUT= MAC=f6:45:3a:c6:ce:5e:02:05:b7:2d:08:d2:08:00 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=33782 DF PROTO=ICMP TYPE=8 CODE=0 ID=29026 SEQ=1
PREROUTING     NAT     IP: IN=veth0 OUT= MAC=f6:45:3a:c6:ce:5e:02:05:b7:2d:08:d2:08:00 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=33782 DF PROTO=ICMP TYPE=8 CODE=0 ID=29026 SEQ=1
PREROUTING  FILTER   INET: IN=veth0 OUT= MAC=f6:45:3a:c6:ce:5e:02:05:b7:2d:08:d2:08:00 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=33782 DF PROTO=ICMP TYPE=8 CODE=0 ID=29026 SEQ=1
FORWARD     FILTER     IP: IN=veth0 OUT=br0 MAC=f6:45:3a:c6:ce:5e:02:05:b7:2d:08:d2:08:00 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=33782 DF PROTO=ICMP TYPE=8 CODE=0 ID=29026 SEQ=1
FORWARD     FILTER   INET: IN=veth0 OUT=br0 MAC=f6:45:3a:c6:ce:5e:02:05:b7:2d:08:d2:08:00 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=33782 DF PROTO=ICMP TYPE=8 CODE=0 ID=29026 SEQ=1
POSTROUTING FILTER     IP: IN= OUT=br0 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=33782 DF PROTO=ICMP TYPE=8 CODE=0 ID=29026 SEQ=1
POSTROUTING    NAT     IP: IN= OUT=br0 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=33782 DF PROTO=ICMP TYPE=8 CODE=0 ID=29026 SEQ=1
POSTROUTING FILTER   INET: IN= OUT=br0 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=33782 DF PROTO=ICMP TYPE=8 CODE=0 ID=29026 SEQ=1
OUTPUT      FILTER BRIDGE: IN= OUT=ens3 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=33782 DF PROTO=ICMP TYPE=8 CODE=0 ID=29026 SEQ=1
POSTROUTING FILTER BRIDGE: IN= OUT=ens3 SRC=192.168.200.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=33782 DF PROTO=ICMP TYPE=8 CODE=0 ID=29026 SEQ=1
# echo-reply
INGRESS     FILTER NETDEV: IN=ens3 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=39514 PROTO=ICMP TYPE=0 CODE=0 ID=29026 SEQ=1
PREROUTING  FILTER BRIDGE: IN=ens3 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=39514 PROTO=ICMP TYPE=0 CODE=0 ID=29026 SEQ=1
INPUT       FILTER BRIDGE: IN=ens3 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=39514 PROTO=ICMP TYPE=0 CODE=0 ID=29026 SEQ=1
PREROUTING  FILTER     IP: IN=br0 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=39514 PROTO=ICMP TYPE=0 CODE=0 ID=29026 SEQ=1
PREROUTING  FILTER   INET: IN=br0 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=39514 PROTO=ICMP TYPE=0 CODE=0 ID=29026 SEQ=1
FORWARD     FILTER     IP: IN=br0 OUT=veth0 MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=39514 PROTO=ICMP TYPE=0 CODE=0 ID=29026 SEQ=1
FORWARD     FILTER   INET: IN=br0 OUT=veth0 MAC=52:54:00:ea:6a:8e:52:54:00:b4:af:8a:08:00 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=39514 PROTO=ICMP TYPE=0 CODE=0 ID=29026 SEQ=1
POSTROUTING FILTER     IP: IN= OUT=veth0 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=39514 PROTO=ICMP TYPE=0 CODE=0 ID=29026 SEQ=1
POSTROUTING FILTER   INET: IN= OUT=veth0 SRC=192.168.122.1 DST=192.168.200.2 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=39514 PROTO=ICMP TYPE=0 CODE=0 ID=29026 SEQ=1

Same remarks as in the section host(virbr0) → vm(netns(veth1)).

arp family

host(virbr0) → vm(br0)

From the host (laptop):

max@laptop % sudo arping -c 1 -I virbr0 192.168.122.2

In the VM:

INPUT       FILTER    ARP: IN=br0 OUT= ARP HTYPE=1 PTYPE=0x0800 OPCODE=1 MACSRC=52:54:00:b4:af:8a IPSRC=192.168.122.1 MACDST=ff:ff:ff:ff:ff:ff IPDST=192.168.122.2
OUTPUT      FILTER    ARP: IN= OUT=br0 ARP HTYPE=21076 PTYPE=0x00b4 OPCODE=21076

ARP packets goes only through the arp family tables.

Network configuration 2

1 host, 2 VM

VM-1 and 2 are respectively called “arch-64” and “arch-64-clone”. The host hostname is “laptop”.

netdev and arp families

nftables configuration on the host (laptop):

flush ruleset

table ip iptest {
        chain prerouting {
                type filter hook prerouting priority 0; policy accept;
                ip protocol icmp log prefix "PREROUTING  FILTER     IP: "
        }
        chain input {
                type filter hook input priority 0; policy accept;
                ip protocol icmp log prefix "INPUT       FILTER     IP: "
        }
        chain forward {
                type filter hook forward priority 0; policy accept;
                ip protocol icmp log prefix "FORWARD     FILTER     IP: "
        }
        chain output {
                type filter hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT      FILTER     IP: "
        }
        chain postrouting {
                type filter hook postrouting priority 0; policy accept;
                ip protocol icmp log prefix "POSTROUTING FILTER     IP: "
        }
        # NAT
        chain preroutingnat {
                type nat hook prerouting priority 0; policy accept;
                ip protocol icmp log prefix "PREROUTING     NAT     IP: "
        }
        chain inputnat {
                type nat hook input priority 0; policy accept;
                ip protocol icmp log prefix "INPUT          NAT     IP: "
        }
        chain outputnat {
                type nat hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT         NAT     IP: "
        }
        chain postroutingnat {
                type nat hook postrouting priority 0; policy accept;
                ip protocol icmp log prefix "POSTROUTING    NAT     IP: "
        }
        # ROUTE
        chain routeoutput {
                type route hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT       ROUTE     IP: "
        }
}

table inet inettest {
        chain prerouting {
                type filter hook prerouting priority 0; policy accept;
                ip protocol icmp log prefix "PREROUTING  FILTER   INET: "
        }
        chain input {
                type filter hook input priority 0; policy accept;
                ip protocol icmp log prefix "INPUT       FILTER   INET: "
        }
        chain forward {
                type filter hook forward priority 0; policy accept;
                ip protocol icmp log prefix "FORWARD     FILTER   INET: "
        }
        chain output {
                type filter hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT      FILTER   INET: "
        }
        chain postrouting {
                type filter hook postrouting priority 0; policy accept;
                ip protocol icmp log prefix "POSTROUTING FILTER   INET: "
        }
}

table arp arptest {
        chain input {
                type filter hook input priority 0; policy accept;
                log prefix "INPUT       FILTER    ARP: "
        }
        chain output {
                type filter hook output priority 0; policy accept;
                log prefix "OUTPUT      FILTER    ARP: "
        }
}

table bridge bridgetest {
        chain prerouting {
                type filter hook prerouting priority 0; policy accept;
                ip protocol icmp log prefix "PREROUTING  FILTER BRIDGE: "
        }
        chain input {
                type filter hook input priority 0; policy accept;
                ip protocol icmp log prefix "INPUT       FILTER BRIDGE: "
        }
        chain forward {
                type filter hook forward priority 0; policy accept;
                ip protocol icmp log prefix "FORWARD     FILTER BRIDGE: "
        }
        chain output {
                type filter hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT      FILTER BRIDGE: "
        }
        chain postrouting {
                type filter hook postrouting priority 0; policy accept;
                ip protocol icmp log prefix "POSTROUTING FILTER BRIDGE: "
        }
}

table netdev netdevtest {
        chain filter {
                # When adding a chain on ingress hook, it is mandatory to specify the device where the chain will be attached
                type filter hook ingress device vnet0 priority 0; policy accept;
                ip protocol icmp log prefix "INGRESS     FILTER NETDEV: "
        }
}

bridge, vm1 → vm2

From VM-1:

[root@arch-64 ~]# ping -c 1 192.168.122.3

On the host (laptop):

# echo-request
INGRESS     FILTER NETDEV: IN=vnet0 OUT= MAC=52:54:00:c4:31:5c:52:54:00:ea:6a:8e:08:00 SRC=192.168.122.2 DST=192.168.122.3 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=53110 DF PROTO=ICMP TYPE=8 CODE=0 ID=313 SEQ=1
PREROUTING  FILTER BRIDGE: IN=vnet0 OUT= MAC=52:54:00:c4:31:5c:52:54:00:ea:6a:8e:08:00 SRC=192.168.122.2 DST=192.168.122.3 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=53110 DF PROTO=ICMP TYPE=8 CODE=0 ID=313 SEQ=1
FORWARD     FILTER BRIDGE: IN=vnet0 OUT=vnet1 MAC=52:54:00:c4:31:5c:52:54:00:ea:6a:8e:08:00 SRC=192.168.122.2 DST=192.168.122.3 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=53110 DF PROTO=ICMP TYPE=8 CODE=0 ID=313 SEQ=1
POSTROUTING FILTER BRIDGE: IN= OUT=vnet1 SRC=192.168.122.2 DST=192.168.122.3 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=53110 DF PROTO=ICMP TYPE=8 CODE=0 ID=313 SEQ=1
# echo-reply
PREROUTING  FILTER BRIDGE: IN=vnet1 OUT= MAC=52:54:00:ea:6a:8e:52:54:00:c4:31:5c:08:00 SRC=192.168.122.3 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49059 PROTO=ICMP TYPE=0 CODE=0 ID=313 SEQ=1
FORWARD     FILTER BRIDGE: IN=vnet1 OUT=vnet0 MAC=52:54:00:ea:6a:8e:52:54:00:c4:31:5c:08:00 SRC=192.168.122.3 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49059 PROTO=ICMP TYPE=0 CODE=0 ID=313 SEQ=1
POSTROUTING FILTER BRIDGE: IN= OUT=vnet0 SRC=192.168.122.3 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49059 PROTO=ICMP TYPE=0 CODE=0 ID=313 SEQ=1

It’s normal not to have “NETDEV” logs for the echo-reply. The netdev chain is only bound to the vnet0 interface.

arp, vm1 → vm2

It is not possible to filter forwarded arp traffic. The kernel refuse to create forward chain in arp table.

On Linux 4.12.4 I tried to add this chain on the table arp of the host (laptop):

table arp arptest {
        …
	chain forward {
                type filter hook forward priority 0; policy accept;
                log prefix "FORWARD     FILTER    ARP: "
        }
        …
}

nft doesn’t complain but from VM-1:

[root@arch-64 ~]# arping -c 1 192.168.122.3

On the host (laptop):

INPUT       FILTER    ARP: IN=virbr0 OUT= ARP HTYPE=1 PTYPE=0x0800 OPCODE=1 MACSRC=52:54:00:ea:6a:8e IPSRC=192.168.122.2 MACDST=ff:ff:ff:ff:ff:ff IPDST=192.168.122.3

I didn’t saw any packet going through the forward arp filter.

It was a bug, I reported it on the netfilter bugzilla. The kernel shouldn’t have allowed me to apply this chain as it doesn’t work. It’s fixed now and the kernel reject this chain (as it should).

arp family and qdisc system

nftables configuration on the VM-1:

flush ruleset

table arp arptest {
        chain input {
                type filter hook input priority 0; policy accept;
                log prefix "INPUT       FILTER    ARP: "
        }
        chain output {
                type filter hook output priority 0; policy accept;
                log prefix "OUTPUT      FILTER    ARP: "
        }
}

table netdev netdevtest {
        chain filter {
                # When adding a chain on ingress hook, it is mandatory to specify the device where the chain will be attached
                type filter hook ingress device ens3 priority 0; policy accept;
                ip protocol icmp log prefix "INGRESS     FILTER NETDEV: "
        }
}

arp, vm1 → vm2

From VM-1:

[root@arch-64 ~]# ip neigh del 192.168.122.3 dev ens3 lladdr 52:54:00:c4:31:5c
[root@arch-64 ~]# ping -c1 192.168.122.3

On VM-1:

OUTPUT      FILTER    ARP: IN= OUT=ens3 ARP HTYPE=65535 PTYPE=0xffff OPCODE=21076
INPUT       FILTER    ARP: IN=ens3 OUT= ARP HTYPE=1 PTYPE=0x0800 OPCODE=2 MACSRC=52:54:00:c4:31:5c IPSRC=192.168.122.3 MACDST=52:54:00:ea:6a:8e IPDST=192.168.122.2

egress qdisc, arp, vm2 → vm1

Egress qdisc configuration on VM-1, it drop every ARP packet:

[root@arch-64 ~]# tc qdisc del dev ens3 root
[root@arch-64 ~]# tc qdisc add dev ens3 root handle 1: prio 
[root@arch-64 ~]# tc filter add dev ens3 parent 1: protocol arp u32 match u32 0 0 action drop

From VM-2:

[root@arch-64-clone ~]# arping -c1 192.168.122.2

On VM-1:

INPUT       FILTER    ARP: IN=ens3 OUT= ARP HTYPE=1 PTYPE=0x0800 OPCODE=1 MACSRC=52:54:00:c4:31:5c IPSRC=192.168.122.3 MACDST=ff:ff:ff:ff:ff:ff IPDST=192.168.122.2
OUTPUT      FILTER    ARP: IN= OUT=ens3 ARP HTYPE=21076 PTYPE=0x00c4 OPCODE=21076

So “OUTPUT FILTER ARP” get hit before egress qdisc.

Off-topic: to remove qdisc configuration previously applied:

[root@arch-64 ~]# tc qdisc delete dev ens3 root

ingress qdisc, icmp, vm2 → vm1

Ingress qdisc configuration on VM-1, it drop every ICMP packet:

[root@arch-64 ~]# tc qdisc add dev ens3 handle ffff: ingress
[root@arch-64 ~]# tc filter add dev ens3 parent ffff: protocol ip u32 match ip protocol 1 0xff action drop

From VM-2:

[root@arch-64-clone ~]# ping -c1 192.168.122.2

On VM-1 I get no log. All ICMP packets are dropped by the qdisc filter before they can hit nftables chains. Ingress qdisc is hit before “INPUT FILTER ARP”.

Off-topic: to remove qdisc configuration previously applied:

[root@arch-64 ~]# tc qdisc delete dev ens3 handle ffff: ingress

ingress qdisc, arp, vm2 → vm1

Ingress qdisc configuration on VM-1, it drop every ARP packet:

[root@arch-64 ~]# tc qdisc add dev ens3 handle ffff: ingress
[root@arch-64 ~]# tc filter add dev ens3 parent ffff: protocol arp u32 match u32 0 0 action drop

From VM-2:

[root@arch-64-clone ~]# arping -c1 192.168.122.2

On VM-1 I get no log. All ICMP packets are dropped by the qdisc filter before they can hit nftables chains. Ingress qdisc is hit before “INGRESS FILTER NETDEV”.

I tried to change the netdev chain prio (-500), but it didn’t change anything.

Off-topic: to remove qdisc configuration previously applied:

[root@arch-64 ~]# tc qdisc delete dev ens3 handle ffff: ingress

Network configuration 3

1 host (2 interfaces), 1 VM (2 interfaces)

The VM’s hostname is “arch-64” and hosts’s is “laptop”.

route chain

nftables configuration on the VM:

flush ruleset

table ip iptest {
        chain prerouting {
                type filter hook prerouting priority 0; policy accept;
                ip protocol icmp log prefix "PREROUTING  FILTER     IP: "
        }
        chain input {
                type filter hook input priority 0; policy accept;
                ip protocol icmp log prefix "INPUT       FILTER     IP: "
        }
        chain forward {
                type filter hook forward priority 0; policy accept;
                ip protocol icmp log prefix "FORWARD     FILTER     IP: "
        }
        chain output1 {
                type filter hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT     FILTER1     IP: "
        }
        chain output2 {
                type filter hook output priority 32767; policy accept;
                ip protocol icmp log prefix "OUTPUT     FILTER2     IP: "
        }
        chain postrouting {
                type filter hook postrouting priority 0; policy accept;
                ip protocol icmp log prefix "POSTROUTING FILTER     IP: "
        }
        # NAT
        chain preroutingnat {
                type nat hook prerouting priority 0; policy accept;
                ip protocol icmp log prefix "PREROUTING     NAT     IP: "
        }
        chain inputnat {
                type nat hook input priority 0; policy accept;
                ip protocol icmp log prefix "INPUT          NAT     IP: "
        }
        chain outputnat {
                type nat hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT         NAT     IP: "
        }
        chain postroutingnat {
                type nat hook postrouting priority 0; policy accept;
                ip protocol icmp log prefix "POSTROUTING    NAT     IP: "
        }
        # ROUTE
        chain routeoutput {
                type route hook output priority -1; policy accept;
                ip protocol icmp log prefix "OUTPUT       ROUTE     IP: "
                ip protocol icmp ip daddr set 192.168.200.1
        }
}

From the VM:

[root@arch-64 ~]# ping -c1 192.168.122.1

On the VM:

OUTPUT       ROUTE     IP: IN= OUT=ens3 SRC=192.168.122.3 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=37830 DF PROTO=ICMP TYPE=8 CODE=0 ID=971 SEQ=1
OUTPUT     FILTER1     IP: IN= OUT=ens3 SRC=192.168.122.3 DST=192.168.200.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=37830 DF PROTO=ICMP TYPE=8 CODE=0 ID=971 SEQ=1
OUTPUT         NAT     IP: IN= OUT=ens3 SRC=192.168.122.3 DST=192.168.200.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=37830 DF PROTO=ICMP TYPE=8 CODE=0 ID=971 SEQ=1
OUTPUT     FILTER2     IP: IN= OUT=ens3 SRC=192.168.122.3 DST=192.168.200.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=37830 DF PROTO=ICMP TYPE=8 CODE=0 ID=971 SEQ=1
POSTROUTING FILTER     IP: IN= OUT=ens8 SRC=192.168.122.3 DST=192.168.200.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=37830 DF PROTO=ICMP TYPE=8 CODE=0 ID=971 SEQ=1
POSTROUTING    NAT     IP: IN= OUT=ens8 SRC=192.168.122.3 DST=192.168.200.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=37830 DF PROTO=ICMP TYPE=8 CODE=0 ID=971 SEQ=1

The re-routing decision is between the output and the postrouting hooks.

nat statement

nftables configuration on the VM:

flush ruleset

table ip iptest {
        chain prerouting {
                type filter hook prerouting priority 0; policy accept;
                ip protocol icmp log prefix "PREROUTING  FILTER     IP: "
        }
        chain input {
                type filter hook input priority 0; policy accept;
                ip protocol icmp log prefix "INPUT       FILTER     IP: "
        }
        chain forward {
                type filter hook forward priority 0; policy accept;
                ip protocol icmp log prefix "FORWARD     FILTER     IP: "
        }
        chain output1 {
                type filter hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT     FILTER1     IP: "
        }
        chain output2 {
                type filter hook output priority 10000000; policy accept;
                ip protocol icmp log prefix "OUTPUT     FILTER2     IP: "
        }
        chain postrouting {
                type filter hook postrouting priority 0; policy accept;
                ip protocol icmp log prefix "POSTROUTING FILTER     IP: "
        }
        # NAT
        chain preroutingnat {
                type nat hook prerouting priority 0; policy accept;
                ip protocol icmp log prefix "PREROUTING     NAT     IP: "
        }
        chain inputnat {
                type nat hook input priority 0; policy accept;
                ip protocol icmp log prefix "INPUT          NAT     IP: "
        }
        chain outputnat {
                type nat hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT         NAT     IP: "
                ip protocol icmp dnat 192.168.200.1
        }
        chain postroutingnat {
                type nat hook postrouting priority 0; policy accept;
                ip protocol icmp log prefix "POSTROUTING    NAT     IP: "
        }
        # ROUTE
        chain routeoutput {
                type route hook output priority -1; policy accept;
                ip protocol icmp log prefix "OUTPUT       ROUTE     IP: "
        }
}

From the VM:

[root@arch-64 ~]# ping -c1 192.168.122.1

On the VM:

OUTPUT       ROUTE     IP: IN= OUT=ens3 SRC=192.168.122.3 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=3744 DF PROTO=ICMP TYPE=8 CODE=0 ID=1093 SEQ=1
OUTPUT     FILTER1     IP: IN= OUT=ens3 SRC=192.168.122.3 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=3744 DF PROTO=ICMP TYPE=8 CODE=0 ID=1093 SEQ=1
OUTPUT         NAT     IP: IN= OUT=ens3 SRC=192.168.122.3 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=3744 DF PROTO=ICMP TYPE=8 CODE=0 ID=1093 SEQ=1
OUTPUT     FILTER2     IP: IN= OUT=ens3 SRC=192.168.122.3 DST=192.168.200.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=3744 DF PROTO=ICMP TYPE=8 CODE=0 ID=1093 SEQ=1
POSTROUTING FILTER     IP: IN= OUT=ens8 SRC=192.168.122.3 DST=192.168.200.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=3744 DF PROTO=ICMP TYPE=8 CODE=0 ID=1093 SEQ=1
POSTROUTING    NAT     IP: IN= OUT=ens8 SRC=192.168.122.3 DST=192.168.200.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=3744 DF PROTO=ICMP TYPE=8 CODE=0 ID=1093 SEQ=1

dnat statement triggers the re-routing.

payload statement in nat chain

nftables configuration on the VM:

flush ruleset

table ip iptest {
        chain prerouting {
                type filter hook prerouting priority 0; policy accept;
                ip protocol icmp log prefix "PREROUTING  FILTER     IP: "
        }
        chain input {
                type filter hook input priority 0; policy accept;
                ip protocol icmp log prefix "INPUT       FILTER     IP: "
        }
        chain forward {
                type filter hook forward priority 0; policy accept;
                ip protocol icmp log prefix "FORWARD     FILTER     IP: "
        }
        chain output1 {
                type filter hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT     FILTER1     IP: "
        }
        chain output2 {
                type filter hook output priority 10000000; policy accept;
                ip protocol icmp log prefix "OUTPUT     FILTER2     IP: "
        }
        chain postrouting {
                type filter hook postrouting priority 0; policy accept;
                ip protocol icmp log prefix "POSTROUTING FILTER     IP: "
        }
        # NAT
        chain preroutingnat {
                type nat hook prerouting priority 0; policy accept;
                ip protocol icmp log prefix "PREROUTING     NAT     IP: "
        }
        chain inputnat {
                type nat hook input priority 0; policy accept;
                ip protocol icmp log prefix "INPUT          NAT     IP: "
        }
        chain outputnat {
                type nat hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT         NAT     IP: "
                ip protocol icmp ip daddr set 192.168.200.1
        }
        chain postroutingnat {
                type nat hook postrouting priority 0; policy accept;
                ip protocol icmp log prefix "POSTROUTING    NAT     IP: "
        }
        # ROUTE
        chain routeoutput {
                type route hook output priority -1; policy accept;
                ip protocol icmp log prefix "OUTPUT       ROUTE     IP: "
        }
}

From the VM:

[root@arch-64 ~]# ping -c1 192.168.122.1

On the VM:

OUTPUT       ROUTE     IP: IN= OUT=ens3 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=34649 DF PROTO=ICMP TYPE=8 CODE=0 ID=478 SEQ=1
OUTPUT     FILTER1     IP: IN= OUT=ens3 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=34649 DF PROTO=ICMP TYPE=8 CODE=0 ID=478 SEQ=1
OUTPUT         NAT     IP: IN= OUT=ens3 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=34649 DF PROTO=ICMP TYPE=8 CODE=0 ID=478 SEQ=1
OUTPUT     FILTER2     IP: IN= OUT=ens3 SRC=192.168.122.2 DST=192.168.200.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=34649 DF PROTO=ICMP TYPE=8 CODE=0 ID=478 SEQ=1
POSTROUTING FILTER     IP: IN= OUT=ens3 SRC=192.168.122.2 DST=192.168.200.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=34649 DF PROTO=ICMP TYPE=8 CODE=0 ID=478 SEQ=1
POSTROUTING    NAT     IP: IN= OUT=ens3 SRC=192.168.122.2 DST=192.168.200.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=34649 DF PROTO=ICMP TYPE=8 CODE=0 ID=478 SEQ=1
PREROUTING  FILTER     IP: IN=ens3 OUT= MAC=52:54:00:c4:31:5c:52:54:00:b4:af:8a:08:00 SRC=192.168.200.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=32426 PROTO=ICMP TYPE=0 CODE=0 ID=478 SEQ=1

The re-routing isn’t triggered by payload statement in nat chains.

payload statement in ip chain

nftables configuration on the VM:

flush ruleset

table ip iptest {
        chain prerouting {
                type filter hook prerouting priority 0; policy accept;
                ip protocol icmp log prefix "PREROUTING  FILTER     IP: "
        }
        chain input {
                type filter hook input priority 0; policy accept;
                ip protocol icmp log prefix "INPUT       FILTER     IP: "
        }
        chain forward {
                type filter hook forward priority 0; policy accept;
                ip protocol icmp log prefix "FORWARD     FILTER     IP: "
        }
        chain output1 {
                type filter hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT     FILTER1     IP: "
                ip protocol icmp ip daddr set 192.168.200.1
        }
        chain output2 {
                type filter hook output priority 10000000; policy accept;
                ip protocol icmp log prefix "OUTPUT     FILTER2     IP: "
        }
        chain postrouting {
                type filter hook postrouting priority 0; policy accept;
                ip protocol icmp log prefix "POSTROUTING FILTER     IP: "
        }
        # NAT
        chain preroutingnat {
                type nat hook prerouting priority 0; policy accept;
                ip protocol icmp log prefix "PREROUTING     NAT     IP: "
        }
        chain inputnat {
                type nat hook input priority 0; policy accept;
                ip protocol icmp log prefix "INPUT          NAT     IP: "
        }
        chain outputnat {
                type nat hook output priority 0; policy accept;
                ip protocol icmp log prefix "OUTPUT         NAT     IP: "
        }
        chain postroutingnat {
                type nat hook postrouting priority 0; policy accept;
                ip protocol icmp log prefix "POSTROUTING    NAT     IP: "
        }
        # ROUTE
        chain routeoutput {
                type route hook output priority -1; policy accept;
                ip protocol icmp log prefix "OUTPUT       ROUTE     IP: "
        }
}

From the VM:

[root@arch-64 ~]# ping -c1 192.168.122.1

On the VM:

OUTPUT       ROUTE     IP: IN= OUT=ens3 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=56528 DF PROTO=ICMP TYPE=8 CODE=0 ID=514 SEQ=1
OUTPUT     FILTER1     IP: IN= OUT=ens3 SRC=192.168.122.2 DST=192.168.122.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=56528 DF PROTO=ICMP TYPE=8 CODE=0 ID=514 SEQ=1
OUTPUT         NAT     IP: IN= OUT=ens3 SRC=192.168.122.2 DST=192.168.200.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=56528 DF PROTO=ICMP TYPE=8 CODE=0 ID=514 SEQ=1
OUTPUT     FILTER2     IP: IN= OUT=ens3 SRC=192.168.122.2 DST=192.168.200.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=56528 DF PROTO=ICMP TYPE=8 CODE=0 ID=514 SEQ=1
POSTROUTING FILTER     IP: IN= OUT=ens3 SRC=192.168.122.2 DST=192.168.200.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=56528 DF PROTO=ICMP TYPE=8 CODE=0 ID=514 SEQ=1
POSTROUTING    NAT     IP: IN= OUT=ens3 SRC=192.168.122.2 DST=192.168.200.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=56528 DF PROTO=ICMP TYPE=8 CODE=0 ID=514 SEQ=1
PREROUTING  FILTER     IP: IN=ens3 OUT= MAC=52:54:00:c4:31:5c:52:54:00:b4:af:8a:08:00 SRC=192.168.200.1 DST=192.168.122.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=54512 PROTO=ICMP TYPE=0 CODE=0 ID=514 SEQ=1

The re-routing isn’t triggered by payload statement in filter chains.