Security Forum 2024
Created: 2024-05-22 Wed 09:20
Carsten Strotmann DNS(SEC)/DANE/DHCP/IPv6/Linux/xBSD/Security Trainer and Consultant
tcpdump and Wiresharktcpdump process in this example)
tcpdump can be instructed to output the BPF source code of the tcpdump filter:
# tcpdump -d port 53 and host 1.1.1.1 Warning: assuming Ethernet (000) ldh [12] (001) jeq #0x86dd jt 19 jf 2 (002) jeq #0x800 jt 3 jf 19 (003) ldb [23] (004) jeq #0x84 jt 7 jf 5 (005) jeq #0x6 jt 7 jf 6 (006) jeq #0x11 jt 7 jf 19 (007) ldh [20] (008) jset #0x1fff jt 19 jf 9 (009) ldxb 4*([14]&0xf) (010) ldh [x + 14] (011) jeq #0x35 jt 14 jf 12 (012) ldh [x + 16] (013) jeq #0x35 jt 14 jf 19 (014) ld [26] (015) jeq #0x1010101 jt 18 jf 16 (016) ld [30] (017) jeq #0x1010101 jt 18 jf 19 (018) ret #262144 (019) ret #0
XDP can discard unwanted network traffic very early in the network stack (e.g. within the network hardware). This can be used to protect against DDoS attacks
bpftrace and bcc (eBPF Compiler
Collection) contain a number of example programs, which can be
used without eBPF programming knowledge
syscount# syscount-bpfcc -p `pgrep named` -i 10 Tracing syscalls, printing top 10... Ctrl+C to quit. [07:34:19] SYSCALL COUNT futex 547 getpid 121 sendto 113 read 56 write 31 epoll_wait 31 openat 23 close 20 epoll_ctl 20 recvmsg 20
# capable-bpfcc | grep named 07:36:17 0 29378 (named) 24 CAP_SYS_RESOURCE 1 07:36:17 0 29378 (named) 24 CAP_SYS_RESOURCE 1 07:36:17 0 29378 (named) 12 CAP_NET_ADMIN 1 07:36:17 0 29378 (named) 21 CAP_SYS_ADMIN 1 07:36:17 0 29378 named 6 CAP_SETGID 1 07:36:17 0 29378 named 6 CAP_SETGID 1 07:36:17 0 29378 named 7 CAP_SETUID 1 07:36:17 109 29378 named 24 CAP_SYS_RESOURCE 1
gethostlatency measures the latency of
client-side DNS name resolution through system calls such as
getaddrinfo or gethostbyname# gethostlatency-bpfcc TIME PID COMM LATms HOST 10:21:58 19183 ping 143.22 example.org 10:22:18 19184 ssh 0.03 host.example.de 10:22:18 19184 ssh 60.59 host.example.de 10:22:35 19185 ping 23.44 isc.org 10:22:49 19186 ping 4459.72 yahoo.co.kr
bpftool is a universal tool to work with eBPF
programs
bpftool is part of the Linux kernel source code and is
maintained by the kernel developersbpftool (it is
usually installed automatically)bpftool prog lists all ePBF programs currently active
in the Linux kernel:
# bpftool prog
2: tracing name hid_tail_call tag 7cc47bbf07148bfe gpl
loaded_at 2024-05-15T06:25:33+0200 uid 0
xlated 56B jited 115B memlock 4096B map_ids 2
btf_id 2
47: lsm name restrict_filesystems tag 713a545fe0530ce7 gpl
loaded_at 2024-05-15T06:25:46+0200 uid 0
xlated 560B jited 305B memlock 4096B map_ids 24
btf_id 62
pids systemd(1)
[...]
bpftool the eBPF assembler code (eBPF bytecode) of a
running program can be printed
# bpftool prog dump xlated name restrict_filesystems int restrict_filesystems(unsigned long long * ctx): ; int BPF_PROG(restrict_filesystems, struct file *file, int ret) 0: (79) r3 = *(u64 *)(r1 +0) 1: (79) r0 = *(u64 *)(r1 +8) 2: (b7) r1 = 0 ; uint32_t *value, *magic_map, magic_number, zero = 0, *is_allow; 3: (63) *(u32 *)(r10 -24) = r1 ; int BPF_PROG(restrict_filesystems, struct file *file, int ret) 4: (bf) r1 = r0 5: (67) r1 <<= 32 [...]
bpftool can also print an active program as assembler source
code of the native CPU architecture (here x86_64):# bpftool prog dump jited name restrict_filesystems int restrict_filesystems(unsigned long long * ctx): bpf_prog_713a545fe0530ce7_restrict_filesystems: ; int BPF_PROG(restrict_filesystems, struct file *file, int ret) 0: endbr64 4: nopl 0x0(%rax,%rax,1) 9: xchg %ax,%ax b: push %rbp c: mov %rsp,%rbp f: endbr64 13: sub $0x18,%rsp [...]
bpftool can list the eBPF maps present in the Linux kernel
# bpftool map
2: prog_array name hid_jmp_table flags 0x0
key 4B value 4B max_entries 1024 memlock 8512B
owner_prog_type tracing owner jited
24: hash_of_maps name cgroup_hash flags 0x0
key 8B value 4B max_entries 2048 memlock 165152B
pids systemd(1)
38: array name libbpf_global flags 0x0
key 4B value 32B max_entries 1 memlock 352B
39: array name pid_iter.rodata flags 0x480
key 4B value 4B max_entries 1 memlock 8192B
btf_id 211 frozen
pids bpftool(23682)
40: array name libbpf_det_bind flags 0x0
key 4B value 32B max_entries 1 memlock 352B
# bpftool map create /sys/fs/bpf/mymap type hash key 4 value 4 entries 10 name mymap # ls -l /sys/fs/bpf/ total 0 -rw-------. 1 root root 0 May 16 19:43 mymap # bpftool map update name mymap key 0x00 0x00 0x10 0x20 value 10 10 10 10 # bpftool map dump name mymap key: 00 00 10 20 value: 0a 0a 0a 0a Found 1 element # rm /sys/fs/bpf/mymap rm: remove regular empty file '/sys/fs/bpf/mymap'? y
bpftool supports working with other eBPF data structuresbpftool can be in JSON (JavaScript Object Notation) for further processing in scripts and programs
# bpftool help
Usage: bpftool [OPTIONS] OBJECT { COMMAND | help }
bpftool batch file FILE
bpftool version
OBJECT := { prog | map | link | cgroup | perf | net | feature | btf | gen | struct_ops | iter }
OPTIONS := { {-j|--json} [{-p|--pretty}] | {-d|--debug} |
{-V|--version} }
kernel.unprivileged_bpf_disabled > 0CONFIG_BPF_KPROBE_OVERRIDE is active in the kernel
configuration (at kernel compile time), eBPF programs can overwrite
the return values of (kernel) functions0 in the pseudo-file
/sys/kernel/debug/kprobes/enabled the kernel probes (KPROBES) are
switched offroot)
can switch the KPROBES on againbpftrace
# capable-bpfcc | grep named 07:36:17 0 29378 (named) 24 CAP_SYS_RESOURCE 1 07:36:17 0 29378 (named) 24 CAP_SYS_RESOURCE 1 07:36:17 0 29378 (named) 12 CAP_NET_ADMIN 1 07:36:17 0 29378 (named) 21 CAP_SYS_ADMIN 1 07:36:17 0 29378 named 6 CAP_SETGID 1 07:36:17 0 29378 named 6 CAP_SETGID 1 07:36:17 0 29378 named 7 CAP_SETUID 1 07:36:17 109 29378 named 24 CAP_SYS_RESOURCE 1
bpftrace is a small scripting language similar to awk or
dtrace
bpftrace programs bind to eBPF probes and execute functions
whenever a system event is reported (systemcall, function-call)bpftrace has built-in helper structures to work directly with
eBPF data structuresbpftrace allows to write eBPF programs more compact compared to
BCCbpftrace example programsnetqtop - Outputs statistics about the queues of a network
interface. This program can be used to collect information when a
network interface is congested# netqtop-bpfcc -n eth0 -i 10 Mon Nov 15 07:43:29 2021 TX QueueID avg_size [0, 64) [64, 512) [512, 2K) [2K, 16K) [16K, 64K) 0 297.82 2 48 1 4 0 Total 297.82 2 48 1 4 0 RX QueueID avg_size [0, 64) [64, 512) [512, 2K) [2K, 16K) [16K, 64K) 0 70.95 43 34 0 0 0 Total 70.95 43 34 0 0 0 -----------------------------------------------------------------------------
# tcptracer-bpfcc -p $(pgrep named) Tracing TCP established connections. Ctrl-C to end. T PID COMM IP SADDR DADDR SPORT DPORT C 29404 isc-net-0000 4 127.0.0.1 127.0.0.1 41555 953 A 29378 isc-socket-0 4 127.0.0.1 127.0.0.1 953 41555 X 29404 isc-socket-0 4 127.0.0.1 127.0.0.1 41555 953 X 29378 isc-socket-0 4 127.0.0.1 127.0.0.1 953 41555 C 29378 isc-net-0000 4 46.101.109.138 192.33.4.12 43555 53 C 29378 isc-net-0000 4 46.101.109.138 192.33.4.12 33751 53 X 29378 isc-socket-0 4 46.101.109.138 192.33.4.12 43555 53 X 29378 isc-socket-0 4 46.101.109.138 192.33.4.12 33751 53 C 29378 isc-net-0000 4 46.101.109.138 193.0.14.129 38145 53 C 29378 isc-net-0000 4 46.101.109.138 192.33.14.30 40905 53 X 29378 isc-socket-0 4 46.101.109.138 193.0.14.129 38145 53 X 29378 isc-socket-0 4 46.101.109.138 192.33.14.30 40905 53
tcpconnlat outputs the latency of a TCP-based connection, here an
outgoing DNS query via TCP from a BIND 9 resolver (in the example a
query from microsoft.com txt, where the response is too large for
a 1232 byte UDP packet)
isc-net-0000 is the internal name of the BIND 9 thread# tcpconnlat-bpfcc PID COMM IP SADDR DADDR DPORT LAT(ms) 29378 isc-net-0000 4 46.101.109.138 193.0.14.129 53 37.50 29378 isc-net-0000 4 46.101.109.138 192.52.178.30 53 14.01 29378 isc-net-0000 4 46.101.109.138 199.9.14.201 53 8.48 29378 isc-net-0000 4 46.101.109.138 192.42.93.30 53 1.90 29378 isc-net-0000 4 46.101.109.138 40.90.4.205 53 14.27 29378 isc-net-0000 4 46.101.109.138 199.254.48.1 53 19.21 29378 isc-net-0000 4 46.101.109.138 192.48.79.30 53 7.66 29378 isc-net-0000 4 46.101.109.138 192.41.162.30 53 7.97 29396 isc-net-0000 4 127.0.0.1 127.0.0.1 53 0.06
udplife is a bpftrace script to print the UDP round trip time
(here DNS round trip time) of a UDP communication (program by
Brendan Gregg, see links at the end of the slide deck)# udplife.bt Attaching 8 probes... PID COMM LADDR LPORT RADDR RPORT TX_B RX_B MS 29378 isc-net-00 46.101.109.138 0 199.19.57.1 16503 48 420 268 29378 isc-net-00 46.101.109.138 0 51.75.79.143 81 49 43 13 29378 isc-net-00 46.101.109.138 0 199.6.1.52 16452 48 408 24 29378 isc-net-00 46.101.109.138 0 199.249.120.1 81 44 10 9 29378 isc-net-00 46.101.109.138 0 199.254.31.1 32891 64 30 273 29378 isc-net-00 46.101.109.138 0 65.22.6.1 32891 64 46 266
zone "dnslab.org" {
type forward;
forwarders { 1.1.1.1; 8.8.8.8; };
};
bpftrace script to print BIND 9 DNS forwarding
decisionsdns_fwdtable_find in /lib/dns/forward.c. This looks
promising:
dns_fwdtable_find takes a domain name as input
parameter and returns the value 0 if the name has to be resolved
via forwarding, and a value > 0 if forwarding is not used
bpftrace one-liner gives us the information whether this
function can be used for this task:
bpftrace -e 'uretprobe:/lib/x86_64-linux-gnu/libdns-9.16.22-Debian.so:dns_fwdtable_find { print(retval) }'
dns_fwdtable_find functiondns_fwdtable_find
on entry to the functionretval) for the value
zero (0) and output the domain name if forwarding is useddns_name_t
dns_name_t. The 2nd field is an
unsigned char * ndata, this seems to be the domain name
The definition of the data structure dns_name_t is in the file lib/dns/include/dns/name.h
bpftrace uses a syntax similar to the C programming language, so we can import the definition of the data structure
from the BIND 9 source code directly into the bpftrace script
isc_buffer_t field are not needed for
our script and since these fields are not based on built-in data
types we comment them out:
#!/usr/bin/bpftrace
struct dns_name {
unsigned int magic;
unsigned char *ndata;
unsigned int length;
unsigned int labels;
unsigned int attributes;
unsigned char *offsets;
// isc_buffer_t *buffer;
// ISC_LINK(dns_name_t) link;
// ISC_LIST(dns_rdataset_t) list;
};
[...]
BEGIN pseudo probe becomes active at the start of the script and outputs a message to the
terminal to inform the user that the script has been started
successfully
[...]
BEGIN {
print("Waiting for forward decision...\n");
}
[...]
uprobe (user-space entry probe)dns_fwdtable_find in
the dynamic library file
/lib/x86_64-linux-gnu/libdns-9.16.22-Debian.soarg1) is cast to a
struct dns_name and the field ndata is referenced@dns_name[tid] (indexed with the thread ID (tid) of the
running BIND 9 thread in the process)
[...]
uprobe:/lib/x86_64-linux-gnu/libdns-9.16.22-Debian.so:dns_fwdtable_find
{
@dns_name[tid] = ((struct dns_name *)arg1)->ndata
}
[...]
uretprobe - User-space function Return Probe)
0 (domain name must
be resolved via forwarding), the value of the variable
@dns_name[tid] is converted into a character string and printed
on the terminal@dns_name[tid] is no longer required and is
deleted
uretprobe:/lib/x86_64-linux-gnu/libdns-9.16.22-Debian.so:dns_fwdtable_find
{
if (retval == 0) {
printf("Forwarded domain name: %s\n", str(@dns_name[tid]));
}
delete(@dns_name[tid]);
}
#!/usr/bin/bpftrace
struct dns_name {
unsigned int magic;
unsigned char *ndata;
unsigned int length;
unsigned int labels;
unsigned int attributes;
unsigned char *offsets;
// isc_buffer_t *buffer;
// ISC_LINK(dns_name_t) link;
// ISC_LIST(dns_rdataset_t) list;
};
BEGIN
{
print("Waiting for forward decision...\n");
}
uprobe:/lib/x86_64-linux-gnu/libdns-9.16.22-Debian.so:dns_fwdtable_find
{
@dns_name[tid] = ((struct dns_name *)arg1)->ndata
}
uretprobe:/lib/x86_64-linux-gnu/libdns-9.16.22-Debian.so:dns_fwdtable_find
{
if (retval == 0) {
printf("Forwarded domain name: %s\n", str(@dns_name[tid]));
}
delete(@dns_name[tid]);
}
bpftrace script also becomes active
dnslab.org domain
are forwarded via forwarding, but not the requests to ietf.org
bpftool to find and analyze eBPF programs on a Linux system
tcpdump https://github.com/mozillazg/ptcpdump
Contact:
cs@sys4.de