DLPX-83701 Make the kernel function mnt_add_count() traceable #16

don-brady · 2022-12-07T22:32:09Z

Background

Some of the kernel functions in the unmount path are not traceable which makes it harder to debug busy unmounts.

Problem

To help with diagnosing busy unmounts we should allow tracing of mnt_add_count() and do_umount() in bpftrace probes.

Solution

Augment these functions with noinline and __noclone (disables the "-fipa-sra" compilation optimization to keep the compiler from optimizing the function signatures).

Testing Done

Confirm these function symbols are traceable:

delphix@ip-10-110-220-172:~$ sudo bpftrace -l '*mnt_add_count'
kprobe:mnt_add_count
delphix@ip-10-110-220-172:~$ sudo bpftrace -l '*do_umount'
kprobe:do_umount

Also used them with a bpftrace script that watches mnt_add_count calls that decrement the mount reference after a busy unmount occurred.

…g the sock BugLink: https://bugs.launchpad.net/bugs/2003914 [ Upstream commit 3cf7203 ] There is a race condition in vxlan that when deleting a vxlan device during receiving packets, there is a possibility that the sock is released after getting vxlan_sock vs from sk_user_data. Then in later vxlan_ecn_decapsulate(), vxlan_get_sk_family() we will got NULL pointer dereference. e.g. #0 [ffffa25ec6978a38] machine_kexec at ffffffff8c669757 #1 [ffffa25ec6978a90] __crash_kexec at ffffffff8c7c0a4d #2 [ffffa25ec6978b58] crash_kexec at ffffffff8c7c1c48 #3 [ffffa25ec6978b60] oops_end at ffffffff8c627f2b #4 [ffffa25ec6978b80] page_fault_oops at ffffffff8c678fcb #5 [ffffa25ec6978bd8] exc_page_fault at ffffffff8d109542 #6 [ffffa25ec6978c00] asm_exc_page_fault at ffffffff8d200b62 [exception RIP: vxlan_ecn_decapsulate+0x3b] RIP: ffffffffc1014e7b RSP: ffffa25ec6978cb0 RFLAGS: 00010246 RAX: 0000000000000008 RBX: ffff8aa000888000 RCX: 0000000000000000 RDX: 000000000000000e RSI: ffff8a9fc7ab803e RDI: ffff8a9fd1168700 RBP: ffff8a9fc7ab803e R8: 0000000000700000 R9: 00000000000010ae R10: ffff8a9fcb748980 R11: 0000000000000000 R12: ffff8a9fd1168700 R13: ffff8aa000888000 R14: 00000000002a0000 R15: 00000000000010ae ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffa25ec6978ce8] vxlan_rcv at ffffffffc10189cd [vxlan] #8 [ffffa25ec6978d90] udp_queue_rcv_one_skb at ffffffff8cfb6507 #9 [ffffa25ec6978dc0] udp_unicast_rcv_skb at ffffffff8cfb6e45 #10 [ffffa25ec6978dc8] __udp4_lib_rcv at ffffffff8cfb8807 #11 [ffffa25ec6978e20] ip_protocol_deliver_rcu at ffffffff8cf76951 #12 [ffffa25ec6978e48] ip_local_deliver at ffffffff8cf76bde #13 [ffffa25ec6978ea0] __netif_receive_skb_one_core at ffffffff8cecde9b #14 [ffffa25ec6978ec8] process_backlog at ffffffff8cece139 #15 [ffffa25ec6978f00] __napi_poll at ffffffff8ceced1a #16 [ffffa25ec6978f28] net_rx_action at ffffffff8cecf1f3 #17 [ffffa25ec6978fa0] __softirqentry_text_start at ffffffff8d4000ca #18 [ffffa25ec6978ff0] do_softirq at ffffffff8c6fbdc3 Reproducer: https://github.com/Mellanox/ovs-tests/blob/master/test-ovs-vxlan-remove-tunnel-during-traffic.sh Fix this by waiting for all sk_user_data reader to finish before releasing the sock. Reported-by: Jianlin Shi <[email protected]> Suggested-by: Jakub Sitnicki <[email protected]> Fixes: 6a93cc9 ("udp-tunnel: Add a few more UDP tunnel APIs") Signed-off-by: Hangbin Liu <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Stefan Bader <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2023230 [ Upstream commit 4e264be ] When a system with E810 with existing VFs gets rebooted the following hang may be observed. Pid 1 is hung in iavf_remove(), part of a network driver: PID: 1 TASK: ffff965400e5a340 CPU: 24 COMMAND: "systemd-shutdow" #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb #1 [ffffaad04005fae8] schedule at ffffffff8b323e2d #2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc #3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930 #4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf] #5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513 #6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa #7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc #8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e #9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429 #10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4 #11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice] #12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice] #13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice] #14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1 #15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386 #16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870 #17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6 #18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159 #19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc #20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d #21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169 #22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b RIP: 00007f1baa5c13d7 RSP: 00007fffbcc55a98 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1baa5c13d7 RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead RBP: 00007fffbcc55ca0 R8: 0000000000000000 R9: 00007fffbcc54e90 R10: 00007fffbcc55050 R11: 0000000000000202 R12: 0000000000000005 R13: 0000000000000000 R14: 00007fffbcc55af0 R15: 0000000000000000 ORIG_RAX: 00000000000000a9 CS: 0033 SS: 002b During reboot all drivers PM shutdown callbacks are invoked. In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE. In ice_shutdown() the call chain above is executed, which at some point calls iavf_remove(). However iavf_remove() expects the VF to be in one of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If that's not the case it sleeps forever. So if iavf_shutdown() gets invoked before iavf_remove() the system will hang indefinitely because the adapter is already in state __IAVF_REMOVE. Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE, as we already went through iavf_shutdown(). Fixes: 9745780 ("iavf: Add waiting so the port is initialized in remove") Fixes: a841733 ("iavf: Fix race condition between iavf_shutdown and iavf_remove") Reported-by: Marius Cornea <[email protected]> Signed-off-by: Stefan Assmann <[email protected]> Reviewed-by: Michal Kubiak <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Luke Nowakowski-Krijger <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2115678 [ Upstream commit 88f7f56d16f568f19e1a695af34a7f4a6ce537a6 ] When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush() generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC, which causes the flush_bio to be throttled by wbt_wait(). An example from v5.4, similar problem also exists in upstream: crash> bt 2091206 PID: 2091206 TASK: ffff2050df92a300 CPU: 109 COMMAND: "kworker/u260:0" #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8 #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4 #2 [ffff800084a2f880] schedule at ffff800040bfa4b4 #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4 #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0 #6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254 #7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38 #8 [ffff800084a2fa60] generic_make_request at ffff800040570138 #9 [ffff800084a2fae0] submit_bio at ffff8000405703b4 #10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs] #11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs] #12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs] #13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs] #14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs] #15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs] #16 [ffff800084a2fdb0] process_one_work at ffff800040111d08 #17 [ffff800084a2fe00] worker_thread at ffff8000401121cc #18 [ffff800084a2fe70] kthread at ffff800040118de4 After commit 2def284 ("xfs: don't allow log IO to be throttled"), the metadata submitted by xlog_write_iclog() should not be throttled. But due to the existence of the dm layer, throttling flush_bio indirectly causes the metadata bio to be throttled. Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes wbt_should_throttle() return false to avoid wbt_wait(). Signed-off-by: Jinliang Zheng <[email protected]> Reviewed-by: Tianxiang Peng <[email protected]> Reviewed-by: Hao Peng <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Sasha Levin <[email protected]> CVE-2025-38063 Signed-off-by: Manuel Diewald <[email protected]> Signed-off-by: Mehmet Basaran <[email protected]>

DLPX-83701 Make function mnt_add_count() traceable

f7c70d3

pcd1193182 approved these changes Dec 7, 2022

View reviewed changes

tonynguien approved these changes Dec 9, 2022

View reviewed changes

don-brady merged commit 6cff5d4 into delphix:6.0/stage Dec 9, 2022

don-brady deleted the dlpx-83701-generic branch December 9, 2022 23:21

delphix-devops-bot pushed a commit that referenced this pull request Dec 15, 2022

DLPX-83701 Make function mnt_add_count() traceable (#16)

c10e315

delphix-devops-bot pushed a commit that referenced this pull request Jan 7, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

56d0f67

delphix-devops-bot pushed a commit that referenced this pull request Jan 16, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

8c02c75

delphix-devops-bot pushed a commit that referenced this pull request Feb 15, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

14e479d

delphix-devops-bot pushed a commit that referenced this pull request Mar 4, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

899acde

prakashsurya pushed a commit that referenced this pull request Mar 14, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

ef7be3b

prakashsurya pushed a commit that referenced this pull request Mar 14, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

d6d66c2

delphix-devops-bot pushed a commit that referenced this pull request Mar 30, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

830a275

delphix-devops-bot pushed a commit that referenced this pull request Apr 20, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

ffa7443

delphix-devops-bot pushed a commit that referenced this pull request Apr 20, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

ce6f336

delphix-devops-bot pushed a commit that referenced this pull request Apr 28, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

423715f

delphix-devops-bot pushed a commit that referenced this pull request May 26, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

a2060b9

delphix-devops-bot pushed a commit that referenced this pull request Jun 3, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

3f9d1b2

delphix-devops-bot pushed a commit that referenced this pull request Jun 4, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

e0df5e9

delphix-devops-bot pushed a commit that referenced this pull request Jun 5, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

d1e2c6d

prakashsurya pushed a commit that referenced this pull request Aug 8, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

590e30b

delphix-devops-bot pushed a commit that referenced this pull request Aug 24, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

54f9cbf

delphix-devops-bot pushed a commit that referenced this pull request Aug 25, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

474036d

delphix-devops-bot pushed a commit that referenced this pull request Aug 26, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

98e9f37

delphix-devops-bot pushed a commit that referenced this pull request Aug 27, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

5f0c1e4

delphix-devops-bot pushed a commit that referenced this pull request Aug 28, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

100eb01

delphix-devops-bot pushed a commit that referenced this pull request Aug 30, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

8c8bb47

delphix-devops-bot pushed a commit that referenced this pull request Aug 31, 2023

DLPX-83701 Make function mnt_add_count() traceable (#16)

c18154b

delphix-devops-bot pushed a commit that referenced this pull request May 12, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

4ccd0cd

delphix-devops-bot pushed a commit that referenced this pull request May 13, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

5dcd7f7

delphix-devops-bot pushed a commit that referenced this pull request May 14, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

a7c4a77

delphix-devops-bot pushed a commit that referenced this pull request May 15, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

5cbdeec

delphix-devops-bot pushed a commit that referenced this pull request May 17, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

0b345b9

delphix-devops-bot pushed a commit that referenced this pull request May 18, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

ba97da2

delphix-devops-bot pushed a commit that referenced this pull request May 19, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

1f8add1

delphix-devops-bot pushed a commit that referenced this pull request May 20, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

26ac844

delphix-devops-bot pushed a commit that referenced this pull request Jun 21, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

5f945fd

delphix-devops-bot pushed a commit that referenced this pull request Jun 22, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

1c344c9

delphix-devops-bot pushed a commit that referenced this pull request Jun 23, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

2a9c944

delphix-devops-bot pushed a commit that referenced this pull request Jun 24, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

37521ea

delphix-devops-bot pushed a commit that referenced this pull request Jun 24, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

49c847e

delphix-devops-bot pushed a commit that referenced this pull request Jul 1, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

dc58ca8

delphix-devops-bot pushed a commit that referenced this pull request Jul 19, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

40a9ef1

delphix-devops-bot pushed a commit that referenced this pull request Jul 20, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

f7133b9

delphix-devops-bot pushed a commit that referenced this pull request Jul 21, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

be9ec97

delphix-devops-bot pushed a commit that referenced this pull request Jul 22, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

c60cd17

delphix-devops-bot pushed a commit that referenced this pull request Jul 23, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

c778450

delphix-devops-bot pushed a commit that referenced this pull request Jul 24, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

3c1ef31

delphix-devops-bot pushed a commit that referenced this pull request Jul 25, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

168f0ca

delphix-devops-bot pushed a commit that referenced this pull request Jul 25, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

17ba883

delphix-devops-bot pushed a commit that referenced this pull request Aug 4, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

8797d04

prakashsurya pushed a commit that referenced this pull request Aug 4, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

abe0dc8

delphix-devops-bot pushed a commit that referenced this pull request Aug 5, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

c0a36a9

delphix-devops-bot pushed a commit that referenced this pull request Aug 7, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

e821666

delphix-devops-bot pushed a commit that referenced this pull request Aug 8, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

4fef514

delphix-devops-bot pushed a commit that referenced this pull request Sep 9, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

cfa8727

delphix-devops-bot pushed a commit that referenced this pull request Sep 26, 2025

DLPX-83701 Make function mnt_add_count() traceable (#16)

d3dec75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DLPX-83701 Make the kernel function mnt_add_count() traceable #16

DLPX-83701 Make the kernel function mnt_add_count() traceable #16

Uh oh!

don-brady commented Dec 7, 2022

Uh oh!

Uh oh!

DLPX-83701 Make the kernel function mnt_add_count() traceable #16

DLPX-83701 Make the kernel function mnt_add_count() traceable #16

Uh oh!

Conversation

don-brady commented Dec 7, 2022

Background

Problem

Solution

Testing Done

Uh oh!

Uh oh!