Security Forum 2024
Created: 2024-05-22 Wed 09:20
Carsten Strotmann DNS(SEC)/DANE/DHCP/IPv6/Linux/xBSD/Security Trainer and Consultant
tcpdump
and Wiresharktcpdump
process in this example)
tcpdump
can be instructed to output the BPF source code of the tcpdump
filter:
# tcpdump -d port 53 and host 1.1.1.1 Warning: assuming Ethernet (000) ldh [12] (001) jeq #0x86dd jt 19 jf 2 (002) jeq #0x800 jt 3 jf 19 (003) ldb [23] (004) jeq #0x84 jt 7 jf 5 (005) jeq #0x6 jt 7 jf 6 (006) jeq #0x11 jt 7 jf 19 (007) ldh [20] (008) jset #0x1fff jt 19 jf 9 (009) ldxb 4*([14]&0xf) (010) ldh [x + 14] (011) jeq #0x35 jt 14 jf 12 (012) ldh [x + 16] (013) jeq #0x35 jt 14 jf 19 (014) ld [26] (015) jeq #0x1010101 jt 18 jf 16 (016) ld [30] (017) jeq #0x1010101 jt 18 jf 19 (018) ret #262144 (019) ret #0
XDP can discard unwanted network traffic very early in the network stack (e.g. within the network hardware). This can be used to protect against DDoS attacks
bpftrace
and bcc
(eBPF Compiler
Collection) contain a number of example programs, which can be
used without eBPF programming knowledge
syscount
# syscount-bpfcc -p `pgrep named` -i 10 Tracing syscalls, printing top 10... Ctrl+C to quit. [07:34:19] SYSCALL COUNT futex 547 getpid 121 sendto 113 read 56 write 31 epoll_wait 31 openat 23 close 20 epoll_ctl 20 recvmsg 20
# capable-bpfcc | grep named 07:36:17 0 29378 (named) 24 CAP_SYS_RESOURCE 1 07:36:17 0 29378 (named) 24 CAP_SYS_RESOURCE 1 07:36:17 0 29378 (named) 12 CAP_NET_ADMIN 1 07:36:17 0 29378 (named) 21 CAP_SYS_ADMIN 1 07:36:17 0 29378 named 6 CAP_SETGID 1 07:36:17 0 29378 named 6 CAP_SETGID 1 07:36:17 0 29378 named 7 CAP_SETUID 1 07:36:17 109 29378 named 24 CAP_SYS_RESOURCE 1
gethostlatency
measures the latency of
client-side DNS name resolution through system calls such as
getaddrinfo
or gethostbyname
# gethostlatency-bpfcc TIME PID COMM LATms HOST 10:21:58 19183 ping 143.22 example.org 10:22:18 19184 ssh 0.03 host.example.de 10:22:18 19184 ssh 60.59 host.example.de 10:22:35 19185 ping 23.44 isc.org 10:22:49 19186 ping 4459.72 yahoo.co.kr
bpftool
is a universal tool to work with eBPF
programs
bpftool
is part of the Linux kernel source code and is
maintained by the kernel developersbpftool
(it is
usually installed automatically)bpftool prog
lists all ePBF programs currently active
in the Linux kernel:# bpftool prog 2: tracing name hid_tail_call tag 7cc47bbf07148bfe gpl loaded_at 2024-05-15T06:25:33+0200 uid 0 xlated 56B jited 115B memlock 4096B map_ids 2 btf_id 2 47: lsm name restrict_filesystems tag 713a545fe0530ce7 gpl loaded_at 2024-05-15T06:25:46+0200 uid 0 xlated 560B jited 305B memlock 4096B map_ids 24 btf_id 62 pids systemd(1) [...]
bpftool
the eBPF assembler code (eBPF bytecode) of a
running program can be printed
# bpftool prog dump xlated name restrict_filesystems int restrict_filesystems(unsigned long long * ctx): ; int BPF_PROG(restrict_filesystems, struct file *file, int ret) 0: (79) r3 = *(u64 *)(r1 +0) 1: (79) r0 = *(u64 *)(r1 +8) 2: (b7) r1 = 0 ; uint32_t *value, *magic_map, magic_number, zero = 0, *is_allow; 3: (63) *(u32 *)(r10 -24) = r1 ; int BPF_PROG(restrict_filesystems, struct file *file, int ret) 4: (bf) r1 = r0 5: (67) r1 <<= 32 [...]
bpftool
can also print an active program as assembler source
code of the native CPU architecture (here x86_64):# bpftool prog dump jited name restrict_filesystems int restrict_filesystems(unsigned long long * ctx): bpf_prog_713a545fe0530ce7_restrict_filesystems: ; int BPF_PROG(restrict_filesystems, struct file *file, int ret) 0: endbr64 4: nopl 0x0(%rax,%rax,1) 9: xchg %ax,%ax b: push %rbp c: mov %rsp,%rbp f: endbr64 13: sub $0x18,%rsp [...]
bpftool
can list the eBPF maps present in the Linux kernel
# bpftool map 2: prog_array name hid_jmp_table flags 0x0 key 4B value 4B max_entries 1024 memlock 8512B owner_prog_type tracing owner jited 24: hash_of_maps name cgroup_hash flags 0x0 key 8B value 4B max_entries 2048 memlock 165152B pids systemd(1) 38: array name libbpf_global flags 0x0 key 4B value 32B max_entries 1 memlock 352B 39: array name pid_iter.rodata flags 0x480 key 4B value 4B max_entries 1 memlock 8192B btf_id 211 frozen pids bpftool(23682) 40: array name libbpf_det_bind flags 0x0 key 4B value 32B max_entries 1 memlock 352B
# bpftool map create /sys/fs/bpf/mymap type hash key 4 value 4 entries 10 name mymap # ls -l /sys/fs/bpf/ total 0 -rw-------. 1 root root 0 May 16 19:43 mymap # bpftool map update name mymap key 0x00 0x00 0x10 0x20 value 10 10 10 10 # bpftool map dump name mymap key: 00 00 10 20 value: 0a 0a 0a 0a Found 1 element # rm /sys/fs/bpf/mymap rm: remove regular empty file '/sys/fs/bpf/mymap'? y
bpftool
supports working with other eBPF data structuresbpftool
can be in JSON (JavaScript Object Notation) for further processing in scripts and programs# bpftool help Usage: bpftool [OPTIONS] OBJECT { COMMAND | help } bpftool batch file FILE bpftool version OBJECT := { prog | map | link | cgroup | perf | net | feature | btf | gen | struct_ops | iter } OPTIONS := { {-j|--json} [{-p|--pretty}] | {-d|--debug} | {-V|--version} }
kernel.unprivileged_bpf_disabled
> 0CONFIG_BPF_KPROBE_OVERRIDE
is active in the kernel
configuration (at kernel compile time), eBPF programs can overwrite
the return values of (kernel) functions0
in the pseudo-file
/sys/kernel/debug/kprobes/enabled
the kernel probes (KPROBES) are
switched offroot
)
can switch the KPROBES on againbpftrace
# capable-bpfcc | grep named 07:36:17 0 29378 (named) 24 CAP_SYS_RESOURCE 1 07:36:17 0 29378 (named) 24 CAP_SYS_RESOURCE 1 07:36:17 0 29378 (named) 12 CAP_NET_ADMIN 1 07:36:17 0 29378 (named) 21 CAP_SYS_ADMIN 1 07:36:17 0 29378 named 6 CAP_SETGID 1 07:36:17 0 29378 named 6 CAP_SETGID 1 07:36:17 0 29378 named 7 CAP_SETUID 1 07:36:17 109 29378 named 24 CAP_SYS_RESOURCE 1
bpftrace
is a small scripting language similar to awk
or
dtrace
bpftrace
programs bind to eBPF probes and execute functions
whenever a system event is reported (systemcall, function-call)bpftrace
has built-in helper structures to work directly with
eBPF data structuresbpftrace
allows to write eBPF programs more compact compared to
BCCbpftrace
example programsnetqtop
- Outputs statistics about the queues of a network
interface. This program can be used to collect information when a
network interface is congested# netqtop-bpfcc -n eth0 -i 10 Mon Nov 15 07:43:29 2021 TX QueueID avg_size [0, 64) [64, 512) [512, 2K) [2K, 16K) [16K, 64K) 0 297.82 2 48 1 4 0 Total 297.82 2 48 1 4 0 RX QueueID avg_size [0, 64) [64, 512) [512, 2K) [2K, 16K) [16K, 64K) 0 70.95 43 34 0 0 0 Total 70.95 43 34 0 0 0 -----------------------------------------------------------------------------
# tcptracer-bpfcc -p $(pgrep named) Tracing TCP established connections. Ctrl-C to end. T PID COMM IP SADDR DADDR SPORT DPORT C 29404 isc-net-0000 4 127.0.0.1 127.0.0.1 41555 953 A 29378 isc-socket-0 4 127.0.0.1 127.0.0.1 953 41555 X 29404 isc-socket-0 4 127.0.0.1 127.0.0.1 41555 953 X 29378 isc-socket-0 4 127.0.0.1 127.0.0.1 953 41555 C 29378 isc-net-0000 4 46.101.109.138 192.33.4.12 43555 53 C 29378 isc-net-0000 4 46.101.109.138 192.33.4.12 33751 53 X 29378 isc-socket-0 4 46.101.109.138 192.33.4.12 43555 53 X 29378 isc-socket-0 4 46.101.109.138 192.33.4.12 33751 53 C 29378 isc-net-0000 4 46.101.109.138 193.0.14.129 38145 53 C 29378 isc-net-0000 4 46.101.109.138 192.33.14.30 40905 53 X 29378 isc-socket-0 4 46.101.109.138 193.0.14.129 38145 53 X 29378 isc-socket-0 4 46.101.109.138 192.33.14.30 40905 53
tcpconnlat
outputs the latency of a TCP-based connection, here an
outgoing DNS query via TCP from a BIND 9 resolver (in the example a
query from microsoft.com txt
, where the response is too large for
a 1232 byte UDP packet)
isc-net-0000
is the internal name of the BIND 9 thread# tcpconnlat-bpfcc PID COMM IP SADDR DADDR DPORT LAT(ms) 29378 isc-net-0000 4 46.101.109.138 193.0.14.129 53 37.50 29378 isc-net-0000 4 46.101.109.138 192.52.178.30 53 14.01 29378 isc-net-0000 4 46.101.109.138 199.9.14.201 53 8.48 29378 isc-net-0000 4 46.101.109.138 192.42.93.30 53 1.90 29378 isc-net-0000 4 46.101.109.138 40.90.4.205 53 14.27 29378 isc-net-0000 4 46.101.109.138 199.254.48.1 53 19.21 29378 isc-net-0000 4 46.101.109.138 192.48.79.30 53 7.66 29378 isc-net-0000 4 46.101.109.138 192.41.162.30 53 7.97 29396 isc-net-0000 4 127.0.0.1 127.0.0.1 53 0.06
udplife
is a bpftrace
script to print the UDP round trip time
(here DNS round trip time) of a UDP communication (program by
Brendan Gregg, see links at the end of the slide deck)# udplife.bt Attaching 8 probes... PID COMM LADDR LPORT RADDR RPORT TX_B RX_B MS 29378 isc-net-00 46.101.109.138 0 199.19.57.1 16503 48 420 268 29378 isc-net-00 46.101.109.138 0 51.75.79.143 81 49 43 13 29378 isc-net-00 46.101.109.138 0 199.6.1.52 16452 48 408 24 29378 isc-net-00 46.101.109.138 0 199.249.120.1 81 44 10 9 29378 isc-net-00 46.101.109.138 0 199.254.31.1 32891 64 30 273 29378 isc-net-00 46.101.109.138 0 65.22.6.1 32891 64 46 266
zone "dnslab.org" { type forward; forwarders { 1.1.1.1; 8.8.8.8; }; };
bpftrace
script to print BIND 9 DNS forwarding
decisionsdns_fwdtable_find
in /lib/dns/forward.c
. This looks
promising:
dns_fwdtable_find
takes a domain name as input
parameter and returns the value 0
if the name has to be resolved
via forwarding, and a value > 0 if forwarding is not used
bpftrace
one-liner gives us the information whether this
function can be used for this task:bpftrace -e 'uretprobe:/lib/x86_64-linux-gnu/libdns-9.16.22-Debian.so:dns_fwdtable_find { print(retval) }'
dns_fwdtable_find
functiondns_fwdtable_find
on entry to the functionretval
) for the value
zero (0
) and output the domain name if forwarding is useddns_name_t
dns_name_t
. The 2nd field is an
unsigned char * ndata
, this seems to be the domain name
The definition of the data structure dns_name_t
is in the file lib/dns/include/dns/name.h
bpftrace
uses a syntax similar to the C programming language, so we can import the definition of the data structure
from the BIND 9 source code directly into the bpftrace
script
isc_buffer_t
field are not needed for
our script and since these fields are not based on built-in data
types we comment them out:#!/usr/bin/bpftrace struct dns_name { unsigned int magic; unsigned char *ndata; unsigned int length; unsigned int labels; unsigned int attributes; unsigned char *offsets; // isc_buffer_t *buffer; // ISC_LINK(dns_name_t) link; // ISC_LIST(dns_rdataset_t) list; }; [...]
BEGIN
pseudo probe becomes active at the start of the script and outputs a message to the
terminal to inform the user that the script has been started
successfully[...] BEGIN { print("Waiting for forward decision...\n"); } [...]
uprobe
(user-space entry probe)dns_fwdtable_find
in
the dynamic library file
/lib/x86_64-linux-gnu/libdns-9.16.22-Debian.so
arg1
) is cast to a
struct dns_name
and the field ndata
is referenced@dns_name[tid]
(indexed with the thread ID (tid
) of the
running BIND 9 thread in the process)[...] uprobe:/lib/x86_64-linux-gnu/libdns-9.16.22-Debian.so:dns_fwdtable_find { @dns_name[tid] = ((struct dns_name *)arg1)->ndata } [...]
uretprobe
- User-space function Return Probe)
0
(domain name must
be resolved via forwarding), the value of the variable
@dns_name[tid]
is converted into a character string and printed
on the terminal@dns_name[tid]
is no longer required and is
deleteduretprobe:/lib/x86_64-linux-gnu/libdns-9.16.22-Debian.so:dns_fwdtable_find { if (retval == 0) { printf("Forwarded domain name: %s\n", str(@dns_name[tid])); } delete(@dns_name[tid]); }
#!/usr/bin/bpftrace struct dns_name { unsigned int magic; unsigned char *ndata; unsigned int length; unsigned int labels; unsigned int attributes; unsigned char *offsets; // isc_buffer_t *buffer; // ISC_LINK(dns_name_t) link; // ISC_LIST(dns_rdataset_t) list; }; BEGIN { print("Waiting for forward decision...\n"); } uprobe:/lib/x86_64-linux-gnu/libdns-9.16.22-Debian.so:dns_fwdtable_find { @dns_name[tid] = ((struct dns_name *)arg1)->ndata } uretprobe:/lib/x86_64-linux-gnu/libdns-9.16.22-Debian.so:dns_fwdtable_find { if (retval == 0) { printf("Forwarded domain name: %s\n", str(@dns_name[tid])); } delete(@dns_name[tid]); }
bpftrace
script also becomes active
dnslab.org
domain
are forwarded via forwarding, but not the requests to ietf.org
bpftool
to find and analyze eBPF programs on a Linux system
tcpdump
https://github.com/mozillazg/ptcpdump
Contact:
cs@sys4.de