Difference between revisions of "Portal:DeveloperDocs/nftables internals"

From nftables wiki
Jump to navigation Jump to search
(→‎expressions: dictionary -> vmap)
(Add a description of libnftnl)
Line 62: Line 62:
== libnftnl ==
== libnftnl ==


'''TODO:''' add info.
This library provides data structures for entities existing in nf_tables
nomenclature, such as tables, chains and rules. It serves as an intermediate
layer between nftables and iptables-nft user space applications and nfnetlink
messages the kernel sends and receives.
 
In general, each data structure comes with a set of handling routines:
 
; allocators : To allocate and free an object of given type
; setters/getters : Data structure fields are accessed via an attribute number (via a specific enum field)
; serializers : Populating a netlink message or vice versa
; printers : Providing a textual representation, mostly for debugging purposes
 
Where sensible, there is a '''list-variant''', too. If so, it comes with
handling routines as well:
 
; allocators : Allocating and freeing the list object (and members)
; populators : Add and remove from the list
 
Where useful, there might be a '''lookup routine''' as well. With
nftnl_chain_list, e.g. the list object contains a hash table for chain names as
well so list lookup by chain name is faster than a linear search.
 
A typical extra for list objects are '''iterators''': A data structure
containing state while browsing through the list. Usually the only routines
used are allocators and a ''next'' routine.
 
These are the '''entities defined by libnftnl''':
 
; table : A rather boring "namespace" for chains
; chain : A container for rules, may attach to a netfilter hook in kernel
; rule : A container for expressions
; expr : An nftables VM code instruction
; flowtable : Similar to a chain, but holds flows between interfaces
; obj : A generic object, typically holding stateful information
; ruleset : A container for lists of tables, chains, sets and rules - not used by nftables application anymore
; set : A container for elements
; set_elem : A set element
; trace : A trace event sent by the kernel
 
=== nftnl_expr ===
 
While nftables distinguishes between expressions and statements, such
difference does not quite exist in libnftnl layer. For instance, a statement
like:
ip saddr 192.168.0.1
is actually two expressions:
; payload : loading IPv4 header's source address into a register
; cmp : comparing data from a register against a stored value
 
Since expressions have access to the packet, its meta data, all nftables
registers (including the verdict register) and may store multiple values
internally, they are mighty and versatile.
 
=== nftnl_obj ===
 
This is a common API for various object types. An object's type is defined post
allocation by setting the ''NFTNL_OBJ_TYPE'' attribute. Currently existing
object types are:
 
* counter
* quota
* ct helper
* limit
* tunnel
* ct timeout
* secmark
* ct expect
* synproxy
 
=== nftnl_batch ===
 
This is a wrapper interface around the same functionality in libmnl (which is
used internally). In general, nftnl batches aid in collecting multiple netlink
messages for kernel submission.


== libnftables ==
== libnftables ==

Revision as of 18:15, 6 May 2022

This page contains information for Netfilter developers on how nftables internals work.

The kernel subsystem

The nf_tables kernel subsystem contains 2 key components:

  • the netlink API (i.e, control plane API)
  • the nf_tables core (i.e, the data plane engine)

Other components, such as external modules, are also in place and are intermixed with both the API and the core.

Generally speaking, the nf_tables subsystem is implementing a virtual machine of low-level expressions that operates on network packets.

TODO: add info.

nf_tables netlink API

The source code is mostly in net/netfilter/nf_tables_api.c [elixir src] [git src]

TODO: add info.

nf_tables core

The source code is mostly in net/netfilter/nf_tables_core.c [elixir src] [git src]

You can see there one of the most important functions in the core: nft_do_chain(). In a nut shell, this is the function that evaluates network packets against the ruleset.

The logic in this function is rather simple:

  • for each rule in the chain
    • for each low level expression in the rule
      • evaluate the packet against the expression
    • evaluate expression return code (break, continue, drop, accept, jump, goto, etc)

TODO: add info.

expressions

There are many low expressions that allows us to operate over network packets in different ways. You can think on these low level expressions as assembly-like instructions.

  • nft_immediate: loads an immediate value into a register.
  • nft_cmp: compare a given data with data from a given register.
  • nft_payload: set/get arbitrary data from packet headers.
  • nft_bitwise: perform bit-wise math operations over data in a given register.
  • nft_byteorder: perform byte order operations over data in a given register.
  • nft_counter: a basic counter for packet/bytes that gets incremented everything is evaluated for a packet.
  • nft_meta: set/get packet meta information, such as related interfaces, timestamps, etc.
  • nft_lookup: search for data from a given register (key) into a dataset. If the set is a map/vmap, returns the value for that key.

TODO: add info.

The userspace components

There are several important components in the userpsace part of nftables:

  • libmnl: generic low level library used to communicate with the kernel using netlink sockets.
  • libnftnl: low level library that is capable of interacting with the nf_tables subsystem netlink API in the kernel. Is responsible for creating/parsing the nf_tables netlink messages. Uses libmnl under the hood.
  • libnftables: high level library that implements the logic to translate from high level statements to netlink objects and the other way around. Uses libnftnl under the hood.
  • nft: the command line interface binary. This is what most end users actually use in their systems. It reads user input and calls libnftables under the hood.

Generally speaking, the userspace compiles high level statements (rules, etc) into the netlink bytecode that the kernel API understands When inspecting the ruleset (i.e, listing it) what it does is the opposite, reconstruct the low level netlink bytecode into high level statements.

libnftnl

This library provides data structures for entities existing in nf_tables nomenclature, such as tables, chains and rules. It serves as an intermediate layer between nftables and iptables-nft user space applications and nfnetlink messages the kernel sends and receives.

In general, each data structure comes with a set of handling routines:

allocators
To allocate and free an object of given type
setters/getters
Data structure fields are accessed via an attribute number (via a specific enum field)
serializers
Populating a netlink message or vice versa
printers
Providing a textual representation, mostly for debugging purposes

Where sensible, there is a list-variant, too. If so, it comes with handling routines as well:

allocators
Allocating and freeing the list object (and members)
populators
Add and remove from the list

Where useful, there might be a lookup routine as well. With nftnl_chain_list, e.g. the list object contains a hash table for chain names as well so list lookup by chain name is faster than a linear search.

A typical extra for list objects are iterators: A data structure containing state while browsing through the list. Usually the only routines used are allocators and a next routine.

These are the entities defined by libnftnl:

table
A rather boring "namespace" for chains
chain
A container for rules, may attach to a netfilter hook in kernel
rule
A container for expressions
expr
An nftables VM code instruction
flowtable
Similar to a chain, but holds flows between interfaces
obj
A generic object, typically holding stateful information
ruleset
A container for lists of tables, chains, sets and rules - not used by nftables application anymore
set
A container for elements
set_elem
A set element
trace
A trace event sent by the kernel

nftnl_expr

While nftables distinguishes between expressions and statements, such difference does not quite exist in libnftnl layer. For instance, a statement like:

ip saddr 192.168.0.1

is actually two expressions:

payload
loading IPv4 header's source address into a register
cmp
comparing data from a register against a stored value

Since expressions have access to the packet, its meta data, all nftables registers (including the verdict register) and may store multiple values internally, they are mighty and versatile.

nftnl_obj

This is a common API for various object types. An object's type is defined post allocation by setting the NFTNL_OBJ_TYPE attribute. Currently existing object types are:

  • counter
  • quota
  • ct helper
  • limit
  • tunnel
  • ct timeout
  • secmark
  • ct expect
  • synproxy

nftnl_batch

This is a wrapper interface around the same functionality in libmnl (which is used internally). In general, nftnl batches aid in collecting multiple netlink messages for kernel submission.

libnftables

TODO: add info.

nft: from userspace to the kernel

TODO: add info.

nft: from the kernel to the usespace

TODO: add info.