Difference between revisions of "Flowtables"

From nftables wiki
Jump to navigation Jump to search
(→‎See also: Added link to notice of Mellanox flowtable hardware offload)
(→‎See also: Added link to Wen Xu's UCloud performance measurements of Mellanox flowtable hardware offload)
Line 82: Line 82:
* [https://netdevconf.info/0x13/session.html?workshop-netfilter-mini Netfilter Mini-Workshop, Netdev 0x13, 2019-03]
* [https://netdevconf.info/0x13/session.html?workshop-netfilter-mini Netfilter Mini-Workshop, Netdev 0x13, 2019-03]
* [https://lwn.net/Articles/804384/ Mellanox flowtable hardware offload]
* [https://lwn.net/Articles/804384/ Mellanox flowtable hardware offload]
* [https://www.programmersought.com/article/11833283913/ Some Mellanox flowtable hardware offload performance measurements by Wen Xu of UCloud]

Revision as of 17:57, 23 March 2021

NOTE: Meters were formerly known as flowtables before nftables 0.8.1 release. Now they are 2 separated, unrelated things.

Flowtables allow you to accelerate packet forwarding in software (and in hardware if your NIC supports it) by using a conntrack-based network stack bypass.

Entries are represented through a tuple that is composed of the input interface, source and destination address, source and destination port; and layer 3/4 protocols. Each entry also caches the destination interface and the gateway address (to update the destination link-layer address) to forward packets.

The TTL and hoplimit fields are also decremented. Hence, flowtables provides an alternative path that allow packets to bypass the classic forwarding path.

                                         userspace process
                                          ^              |
                                          |              |
                                     _____|____     ____\/___
                                    /          \   /         \
                                    |   input  |   |  output |
                                    \__________/   \_________/
                                         ^               |
                                         |               |
      _________      __________      ---------     _____\/_____
     /         \    /          \     |Routing |   /            \
  -->  ingress  ---> prerouting ---> |decision|   | postrouting|--> neigh_xmit
     \_________/    \__________/     ----------   \____________/          ^
       |      ^                          |               ^                |
   flowtable  |                     ____\/___            |                |
       |      |                    /         \           |                |
    __\/___   |                    | forward |------------                |
    |-----|   |                    \_________/                            |
    |-----|   |                 'flow offload' rule                       |
    |-----|   |                   adds entry to                           |
    |_____|   |                     flowtable                             |
       |      |                                                           |
      / \     |                                                           |
     /hit\_no_|                                                           |
     \ ? /                                                                |
      \ /                                                                 |
       |__yes_________________fastpath bypass ____________________________|

               Fig.1 Netfilter hooks and flowtable interactions


Flowtables reside in the ingress hook that is located before the prerouting hook. You can select which flows you want to offload through the flow expression from the forward chain. Flowtables are identified by their address family and their name. The address family must be one of ip, ip6, or inet. When no address family is specified, ip is used by default.

Flows are offloaded after the state is created. A firewall rule to accept the initial traffic is required. The flow expression on the forward chain must match the return traffic of the initial connection.

The *priority* can be a signed integer or *filter* which stands for 0. Addition and subtraction can be used to set relative priority, e.g. filter + 5 equals to 5.

The *devices* are specified as iifname(s) of the input interface(s) of the traffic that should be offloaded. Devices are required for both traffic directions.

Example:

table inet x {

    flowtable f {
        hook ingress priority 0 devices = { eth0, eth1 };
    }

    chain forward {
        type filter hook forward priority 0; policy drop;

        # offload established connections
        ip protocol { tcp, udp } flow offload @f
        ip6 nexthdr { tcp, udp } flow offload @f
        counter packets 0 bytes 0

        # established/related connections
        ct state established,related counter accept

        # allow initial connection
        ip protocol { tcp, udp } accept
        ip6 nexthdr { tcp, udp } accept
    }
}

See also