You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
net, bpf: Fix sk_user_data pointer corruption on 32-bit
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.
This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:
root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
torvalds#356/1 select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G OE 6.17.0-rc1-00233-ge37b36224f81-dirty torvalds#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
dump_backtrace from show_stack+0x20/0x24
r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
show_stack from dump_stack_lvl+0x90/0xc0
dump_stack_lvl from dump_stack+0x18/0x1c
r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
dump_stack from __warn+0x8c/0x1b4
__warn from warn_slowpath_fmt+0x130/0x1a4
r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
__lock_acquire from lock_acquire.part.0+0xbc/0x240
r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
r4:df865cd0
lock_acquire.part.0 from lock_acquire+0x90/0x168
r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
r4:c58863b8
lock_acquire from _raw_write_lock_bh+0x54/0x90
r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
_raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
r6:c59a4000 r5:c5191400 r4:c58863a8
bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
r4:00000000
bpf_map_update_value from map_update_elem+0x210/0x430
r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
r4:c46a6cf8
map_update_elem from __sys_bpf+0x594/0xc94
r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
r4:00000020
__sys_bpf from sys_bpf+0x34/0x3c
r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
r4:00000020
sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0: 00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---
Reviewing kernel usage of sk->sk_user_data and the current flag bits:
#define SK_USER_DATA_NOCOPY 1UL
#define SK_USER_DATA_BPF 2UL
#define SK_USER_DATA_PSOCK 4UL
reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:
enum sk_user_data {
SK_USER_DATA_NONE = 0,
SK_USER_DATA_NOCOPY = 1,
SK_USER_DATA_BPF = 2,
SK_USER_DATA_PSOCK = 3,
};
Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.
Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <[email protected]>
0 commit comments