Difference between revisions of "Netfilter hooks"

From nftables wiki
Jump to navigation Jump to search
(Priority within hook: emphasized nft priority keywords.)
(15 intermediate revisions by 4 users not shown)
Line 1: Line 1:
If you are familiar with Netfilter, don't worry, most of the infrastructure remains the same. ''nftables'' reuses the existing hook infrastructure, [http://people.netfilter.org/pablo/docs/login.pdf Connection Tracking System], NAT engine, logging infrastructure, userspace queueing and so on. Therefore, '''we have only replaced the packet classification framework'''.
''nftables'' uses mostly the same Netfilter infrastructure as legacy ''iptables''. The hook infrastructure, [http://people.netfilter.org/pablo/docs/login.pdf Connection Tracking System], NAT engine, logging infrastructure, and userspace queueing remain the same. Only the packet classification framework is new.


For those unfamiliar with Netfilter, we provide ASCII art to represent our hooks:


                                              Local
__TOC__
                                            process
                                              ^  |      .-----------.
                    .-----------.              |  |      |  Routing  |
                    |          |-----> input /    \---> |  Decision |----> output \
--> prerouting --->|  Routing  |                        .-----------.              \
                    | Decision  |                                                    --> postrouting
                    |          |                                                    /
                    |          |---------------> forward ---------------------------
                    .-----------.


Basically, traffic flowing to the local machine in the input path see the prerouting and input hooks. Then, the traffic that is generated by local processes follows the output and postrouting path.


If you configure your Linux box to behave as router, do not forget to enable forwarding via:
== Netfilter hooks into Linux networking packet flows ==
 
The following schematic shows packet flows through Linux networking:
 
 
https://people.netfilter.org/pablo/nf-hooks.png
 
 
Traffic flowing to the local machine in the input path sees the prerouting and input hooks. Then, the traffic that is generated by local processes follows the output and postrouting path.
 
If you configure your Linux box to behave as a router, do not forget to enable forwarding via:


  echo 1 > /proc/sys/net/ipv4/ip_forward
  echo 1 > /proc/sys/net/ipv4/ip_forward


And then, the packets that are not addressed to your local system will be seen from the forward hook. In summary, packets that are not addressed to local processes follow this path: prerouting, forward and postrouting.
Then packets that are not addressed to your local system will be seen from the forward hook. Such forwarded packets follow the path: prerouting, forward and postrouting.
 
In a major change from iptables, which predefines chains at '''every''' hook (i.e. ''INPUT'' chain in ''filter'' table), nftables predefines '''no''' chains at all. You must must explicitly create a [[Configuring_chains#Base_chain_hooks | base chain]] at each hook at which you want to filter traffic.
 
 
=== Ingress hook ===
 
The ingress hook was added in Linux kernel 4.2. Unlike the other netfilter hooks, the ingress hook is attached to a particular network interface.
 
You can use ''nftables'' with the ingress hook to enforce very early filtering policies that take effect even before prerouting. Do note that at this very early stage, fragmented datagrams have not yet been reassembled. So, for example, matching ip saddr and daddr works for all ip packets, but matching L4 headers like udp dport works only for unfragmented packets, or the first fragment.
 
The ingress hook provides an alternative to ''tc'' ingress filtering. You still need ''tc'' for traffic shaping/queue management.
 
 
== Hooks by family and chain type ==
 
The following table lists available hooks by [[Nftables_families|family]] and [[Configuring_chains#Base_chain_types|chain type]]. Minimum nftables and Linux kernel versions are shown for recently-added hooks.
 
{| class="wikitable"
|- style="vertical-align:bottom;"
! style="text-align:left;" rowspan="2" | Chain type
! colspan="7" | Hooks
 
|-
! ingress
! prerouting
! forward
! input
! output
! postrouting
! egress
 
|- style="vertical-align:bottom;"
! colspan="8" | <br>inet family
 
|- style="vertical-align:top;"
| filter
| {{yes|1=[https://marc.info/?l=netfilter&m=160379555303808&w=2 0.9.7] / [https://kernelnewbies.org/Linux_5.10 5.10]}}
| {{yes}}
| {{yes}}
| {{yes}}
| {{yes}}
| {{yes}}
| {{no}}
 
|- style="vertical-align:top;"
| nat
| {{no}}
| {{yes}}
| {{no}}
| {{yes}}
| {{yes}}
| {{yes}}
| {{no}}
 
|- style="vertical-align:top;"
| route
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{yes}}
| {{no}}
| {{no}}
 
|- style="vertical-align:bottom;"
! colspan="8" | <br>ip6 family
 
|- style="vertical-align:top;"
| filter
| {{no}}
| {{yes}}
| {{yes}}
| {{yes}}
| {{yes}}
| {{yes}}
| {{no}}
 
|- style="vertical-align:top;"
| nat
| {{no}}
| {{yes}}
| {{no}}
| {{yes}}
| {{yes}}
| {{yes}}
| {{no}}
 
|- style="vertical-align:top;"
| route
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{yes}}
| {{no}}
| {{no}}
 
|- style="vertical-align:bottom;"
! colspan="8" | <br>ip family
 
|- style="vertical-align:top;"
| filter
| {{no}}
| {{yes}}
| {{yes}}
| {{yes}}
| {{yes}}
| {{yes}}
| {{no}}
 
|- style="vertical-align:top;"
| nat
| {{no}}
| {{yes}}
| {{no}}
| {{yes}}
| {{yes}}
| {{yes}}
| {{no}}
 
|- style="vertical-align:top;"
| route
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{yes}}
| {{no}}
| {{no}}
 
|- style="vertical-align:bottom;"
! colspan="8" | <br>arp family
 
|- style="vertical-align:top;"
| filter
| {{no}}
| {{no}}
| {{no}}
| {{yes}}
| {{yes}}
| {{no}}
| {{no}}
 
|- style="vertical-align:top;"
| nat
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
 
|- style="vertical-align:top;"
| route
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
 
|- style="vertical-align:bottom;"
! colspan="8" | <br>bridge family
 
|- style="vertical-align:top;"
| filter
| {{no}}
| {{yes}}
| {{yes}}
| {{yes}}
| {{yes}}
| {{yes}}
| {{no}}
 
|- style="vertical-align:top;"
| nat
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
 
|- style="vertical-align:top;"
| route
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
 
|- style="vertical-align:bottom;"
! colspan="8" | <br>netdev family
 
|- style="vertical-align:top;"
| filter
| {{yes|1=[https://marc.info/?l=netfilter&m=146488681521497&w=2 0.6] / [https://kernelnewbies.org/Linux_4.2 4.2]}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no|- / [https://kernelnewbies.org/Linux_5.7 5.7]}}
 
|- style="vertical-align:top;"
| nat
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
 
|- style="vertical-align:top;"
| route
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
 
|}
 
 
== Priority within hook ==
 
Within a given hook, Netfilter performs operations in order of increasing numerical priority. Each nftables [[Configuring_chains#Base_chain_hooks | base&nbsp;chain]] and [[Flowtables|flowtable]] is assigned a priority that defines its ordering among other base chains and flowtables and Netfilter internal operations at the same hook. For example, a chain on the ''prerouting'' hook with priority ''-300'' will be placed before connection tracking operations.
 
The following table shows Netfilter priority values, check the nft manpage for reference.
 
{| class="wikitable"
|- style="vertical-align:bottom;"
! style="text-align:left;" | nftables [[Nftables_families|Families]]
! style="text-align:left;" | Typical hooks
! style="text-align:left;" | ''nft'' Keyword
! style="text-align:right;" | Value
! style="text-align:left;" | Netfilter Internal Priority
! style="text-align:left;" | Description
 
|- style="vertical-align:top;"
|
| prerouting
|
| style="text-align:right;" | -450
| NF_IP_PRI_RAW_BEFORE_DEFRAG
|
 
|- style="vertical-align:top;"
| inet, ip, ip6
| prerouting
|
| style="text-align:right;" | -400
| NF_IP_PRI_CONNTRACK_DEFRAG
| Packet defragmentation / datagram reassembly
 
|- style="vertical-align:top;"
| inet, ip, ip6
| all
| '''raw'''
| style="text-align:right;" | -300
| NF_IP_PRI_RAW
| Traditional priority of the raw table placed before connection tracking operation
 
|- style="vertical-align:top;"
|
|
|
| style="text-align:right;" | -225
| NF_IP_PRI_SELINUX_FIRST
| SELinux operations
 
|- style="vertical-align:top;"
| inet, ip, ip6
| prerouting, output
|
| style="text-align:right;" | -200
| NF_IP_PRI_CONNTRACK
| [[Connection_Tracking_System | Connection tracking]] processes run early in prerouting and output hooks to associate packets with tracked connections.
 
|- style="vertical-align:top;"
| inet, ip, ip6
| all
| '''mangle'''
| style="text-align:right;" | -150
| NF_IP_PRI_MANGLE
| Mangle operation
 
|- style="vertical-align:top;"
| inet, ip, ip6
| prerouting
| '''dstnat'''
| style="text-align:right;" | -100
| NF_IP_PRI_NAT_DST
| Destination NAT
 
|- style="vertical-align:top;"
| inet, ip, ip6, arp, netdev
| all
| '''filter'''
| style="text-align:right;" | 0
| NF_IP_PRI_FILTER
| Filtering operation, the filter table
 
|- style="vertical-align:top;"
| inet, ip, ip6
| all
| '''security'''
| style="text-align:right;" | 50
| NF_IP_PRI_SECURITY
| Place of security table, where secmark can be set for example
 
|- style="vertical-align:top;"
| inet, ip, ip6
| postrouting
| '''srcnat'''
| style="text-align:right;" | 100
| NF_IP_PRI_NAT_SRC
| Source NAT
 
|- style="vertical-align:top;"
|
| postrouting
|
| style="text-align:right;" | 225
| NF_IP_PRI_SELINUX_LAST
| SELinux at packet exit
 
|- style="vertical-align:top;"
| inet, ip, ip6
| postrouting
|
| style="text-align:right;" | 300
| NF_IP_PRI_CONNTRACK_HELPER
| Connection tracking helpers, which identify expected and related packets.
 
|- style="vertical-align:top;"
| inet, ip, ip6
| input, postrouting
|
| style="text-align:right;" | INT_MAX
| NF_IP_PRI_CONNTRACK_CONFIRM
| Connection tracking adds new tracked connections at final step in input & postrouting hooks.
 
|- style="vertical-align:top;"
| colspan="6" | &nbsp;
 
|- style="vertical-align:top;"
| bridge
| prerouting
| '''dstnat'''
| style="text-align:right;" | -300
| NF_BR_PRI_NAT_DST_BRIDGED
|
 
|- style="vertical-align:top;"
| bridge
| all
| '''filter'''
| style="text-align:right;" | -200
| NF_BR_PRI_FILTER_BRIDGED
|
 
|- style="vertical-align:top;"
| bridge
|
|
| style="text-align:right;" | 0
| NF_BR_PRI_BRNF
|
 
|- style="vertical-align:top;"
| bridge
| output
| '''out'''
| style="text-align:right;" | 100
| NF_BR_PRI_NAT_DST_OTHER
|
 
|- style="vertical-align:top;"
| bridge
|
|
| style="text-align:right;" | 200
| NF_BR_PRI_FILTER_OTHER
|


= Ingress hook =
|- style="vertical-align:top;"
| bridge
| postrouting
| '''srcnat'''
| style="text-align:right;" | 300
| NF_BR_PRI_NAT_SRC
|


Since Linux kernel 4.2, Netfilter also comes with an ingress hook that you can use from nftables. So the big picture now look like this:
|}


                                                        Local process
Starting with nftables 0.9.6 you may set priority using keywords instead of numbers. (Note that the same keyword maps to different numerical priorities in the bridge family vs. the other families.) You can also specify priority as an integral offset from a keyword, i.e. ''mangle - 5'' is equivalent to numerical priority -155.
                                                            ^  |    .-----------.
                                  .-----------.             |  |    |  Routing  |
                                  |          |-----> input /  \--->|  Decision |------> output --
--> ingress ---> prerouting ---> |  Routing  |                      .-----------.                 \
                                  | Decision  |                                                    --> postrouting -->
                                  |          |                                                    /
                                  |          |---------------> forward ---------------------------
                                  .-----------.


You can use this new ingress hook to filter traffic from Layer 2, this new hook come earlier than prerouting. This basically provides an alternative to '''tc'''.
It's possible to specify keyword priorities even in family/hook combinations where they don't make logical sense. Recall that the relative numerical ordering of priorities within a given hook is all that matters as far as Netfilter is concerned. Keep in mind that this relative ordering includes packet defragmentation, connection tracking and other Netfilter operations as well as your nftables base chains and flowtables.

Revision as of 15:45, 19 April 2021

nftables uses mostly the same Netfilter infrastructure as legacy iptables. The hook infrastructure, Connection Tracking System, NAT engine, logging infrastructure, and userspace queueing remain the same. Only the packet classification framework is new.



Netfilter hooks into Linux networking packet flows

The following schematic shows packet flows through Linux networking:


nf-hooks.png


Traffic flowing to the local machine in the input path sees the prerouting and input hooks. Then, the traffic that is generated by local processes follows the output and postrouting path.

If you configure your Linux box to behave as a router, do not forget to enable forwarding via:

echo 1 > /proc/sys/net/ipv4/ip_forward

Then packets that are not addressed to your local system will be seen from the forward hook. Such forwarded packets follow the path: prerouting, forward and postrouting.

In a major change from iptables, which predefines chains at every hook (i.e. INPUT chain in filter table), nftables predefines no chains at all. You must must explicitly create a base chain at each hook at which you want to filter traffic.


Ingress hook

The ingress hook was added in Linux kernel 4.2. Unlike the other netfilter hooks, the ingress hook is attached to a particular network interface.

You can use nftables with the ingress hook to enforce very early filtering policies that take effect even before prerouting. Do note that at this very early stage, fragmented datagrams have not yet been reassembled. So, for example, matching ip saddr and daddr works for all ip packets, but matching L4 headers like udp dport works only for unfragmented packets, or the first fragment.

The ingress hook provides an alternative to tc ingress filtering. You still need tc for traffic shaping/queue management.


Hooks by family and chain type

The following table lists available hooks by family and chain type. Minimum nftables and Linux kernel versions are shown for recently-added hooks.

Chain type Hooks
ingress prerouting forward input output postrouting egress

inet family
filter 0.9.7 / 5.10 Yes Yes Yes Yes Yes No
nat No Yes No Yes Yes Yes No
route No No No No Yes No No

ip6 family
filter No Yes Yes Yes Yes Yes No
nat No Yes No Yes Yes Yes No
route No No No No Yes No No

ip family
filter No Yes Yes Yes Yes Yes No
nat No Yes No Yes Yes Yes No
route No No No No Yes No No

arp family
filter No No No Yes Yes No No
nat No No No No No No No
route No No No No No No No

bridge family
filter No Yes Yes Yes Yes Yes No
nat No No No No No No No
route No No No No No No No

netdev family
filter 0.6 / 4.2 No No No No No - / 5.7
nat No No No No No No No
route No No No No No No No


Priority within hook

Within a given hook, Netfilter performs operations in order of increasing numerical priority. Each nftables base chain and flowtable is assigned a priority that defines its ordering among other base chains and flowtables and Netfilter internal operations at the same hook. For example, a chain on the prerouting hook with priority -300 will be placed before connection tracking operations.

The following table shows Netfilter priority values, check the nft manpage for reference.

nftables Families Typical hooks nft Keyword Value Netfilter Internal Priority Description
prerouting -450 NF_IP_PRI_RAW_BEFORE_DEFRAG
inet, ip, ip6 prerouting -400 NF_IP_PRI_CONNTRACK_DEFRAG Packet defragmentation / datagram reassembly
inet, ip, ip6 all raw -300 NF_IP_PRI_RAW Traditional priority of the raw table placed before connection tracking operation
-225 NF_IP_PRI_SELINUX_FIRST SELinux operations
inet, ip, ip6 prerouting, output -200 NF_IP_PRI_CONNTRACK Connection tracking processes run early in prerouting and output hooks to associate packets with tracked connections.
inet, ip, ip6 all mangle -150 NF_IP_PRI_MANGLE Mangle operation
inet, ip, ip6 prerouting dstnat -100 NF_IP_PRI_NAT_DST Destination NAT
inet, ip, ip6, arp, netdev all filter 0 NF_IP_PRI_FILTER Filtering operation, the filter table
inet, ip, ip6 all security 50 NF_IP_PRI_SECURITY Place of security table, where secmark can be set for example
inet, ip, ip6 postrouting srcnat 100 NF_IP_PRI_NAT_SRC Source NAT
postrouting 225 NF_IP_PRI_SELINUX_LAST SELinux at packet exit
inet, ip, ip6 postrouting 300 NF_IP_PRI_CONNTRACK_HELPER Connection tracking helpers, which identify expected and related packets.
inet, ip, ip6 input, postrouting INT_MAX NF_IP_PRI_CONNTRACK_CONFIRM Connection tracking adds new tracked connections at final step in input & postrouting hooks.
 
bridge prerouting dstnat -300 NF_BR_PRI_NAT_DST_BRIDGED
bridge all filter -200 NF_BR_PRI_FILTER_BRIDGED
bridge 0 NF_BR_PRI_BRNF
bridge output out 100 NF_BR_PRI_NAT_DST_OTHER
bridge 200 NF_BR_PRI_FILTER_OTHER
bridge postrouting srcnat 300 NF_BR_PRI_NAT_SRC

Starting with nftables 0.9.6 you may set priority using keywords instead of numbers. (Note that the same keyword maps to different numerical priorities in the bridge family vs. the other families.) You can also specify priority as an integral offset from a keyword, i.e. mangle - 5 is equivalent to numerical priority -155.

It's possible to specify keyword priorities even in family/hook combinations where they don't make logical sense. Recall that the relative numerical ordering of priorities within a given hook is all that matters as far as Netfilter is concerned. Keep in mind that this relative ordering includes packet defragmentation, connection tracking and other Netfilter operations as well as your nftables base chains and flowtables.