fix: ls2k500sfb build error, gpio_device_find_by_label should be used #1

Zeno-sole · 2024-04-25T17:02:39Z

commit: d62fcd9

Florian reported the following kernel NULL pointer dereference issue on a BCM7250 board: [ 2.829744] Unable to handle kernel NULL pointer dereference at virtual address 0000000c when read [ 2.838740] [0000000c] *pgd=80000000004003, *pmd=00000000 [ 2.844178] Internal error: Oops: 206 [#1] SMP ARM [ 2.848990] Modules linked in: [ 2.852061] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.8.0-next-20240305-gd95fcdf4961d torvalds#66 [ 2.860436] Hardware name: Broadcom STB (Flattened Device Tree) [ 2.866371] PC is at brcmnand_read_by_pio+0x180/0x278 [ 2.871449] LR is at __wait_for_common+0x9c/0x1b0 [ 2.876178] pc : [<c094b6cc>] lr : [<c0e66310>] psr: 60000053 [ 2.882460] sp : f0811a80 ip : 00000012 fp : 00000000 [ 2.887699] r10: 00000000 r9 : 00000000 r8 : c3790000 [ 2.892936] r7 : 00000000 r6 : 00000000 r5 : c35db440 r4 : ffe00000 [ 2.899479] r3 : f15cb814 r2 : 00000000 r1 : 00000000 r0 : 00000000 The issue only happens when dma mode is disabled or not supported on STB chip. The pio mode transfer calls brcmnand_read_data_bus function which dereferences ctrl->soc->read_data_bus. But the soc member in STB chip is NULL hence triggers the access violation. The function needs to check the soc pointer first. Fixes: 546e425 ("mtd: rawnand: brcmnand: Add BCMBCA read data bus interface") Reported-by: Florian Fainelli <[email protected]> Tested-by: Florian Fainelli <[email protected]> Signed-off-by: William Zhang <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Link: https://lore.kernel.org/linux-mtd/[email protected]

During the removal of the idxd driver, registered offline callback is invoked as part of the clean up process. However, on systems with only one CPU online, no valid target is available to migrate the perf context, resulting in a kernel oops: BUG: unable to handle page fault for address: 000000000002a2b8 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 1470e1067 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ torvalds#57 Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023 RIP: 0010:mutex_lock+0x2e/0x50 ... Call Trace: <TASK> __die+0x24/0x70 page_fault_oops+0x82/0x160 do_user_addr_fault+0x65/0x6b0 __pfx___rdmsr_safe_on_cpu+0x10/0x10 exc_page_fault+0x7d/0x170 asm_exc_page_fault+0x26/0x30 mutex_lock+0x2e/0x50 mutex_lock+0x1e/0x50 perf_pmu_migrate_context+0x87/0x1f0 perf_event_cpu_offline+0x76/0x90 [idxd] cpuhp_invoke_callback+0xa2/0x4f0 __pfx_perf_event_cpu_offline+0x10/0x10 [idxd] cpuhp_thread_fun+0x98/0x150 smpboot_thread_fn+0x27/0x260 smpboot_thread_fn+0x1af/0x260 __pfx_smpboot_thread_fn+0x10/0x10 kthread+0x103/0x140 __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x50 __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 <TASK> Fix the issue by preventing the migration of the perf context to an invalid target. Fixes: 81dd4d4 ("dmaengine: idxd: Add IDXD performance monitor support") Reported-by: Terrence Xu <[email protected]> Tested-by: Terrence Xu <[email protected]> Signed-off-by: Fenghua Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]>

Doug reported [1] the following hung task: INFO: task swapper/0:1 blocked for more than 122 seconds. Not tainted 5.15.149-21875-gf795ebc40eb8 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00000008 Call trace: __switch_to+0xf4/0x1f4 __schedule+0x418/0xb80 schedule+0x5c/0x10c rpm_resume+0xe0/0x52c rpm_resume+0x178/0x52c __pm_runtime_resume+0x58/0x98 clk_pm_runtime_get+0x30/0xb0 clk_disable_unused_subtree+0x58/0x208 clk_disable_unused_subtree+0x38/0x208 clk_disable_unused_subtree+0x38/0x208 clk_disable_unused_subtree+0x38/0x208 clk_disable_unused_subtree+0x38/0x208 clk_disable_unused+0x4c/0xe4 do_one_initcall+0xcc/0x2d8 do_initcall_level+0xa4/0x148 do_initcalls+0x5c/0x9c do_basic_setup+0x24/0x30 kernel_init_freeable+0xec/0x164 kernel_init+0x28/0x120 ret_from_fork+0x10/0x20 INFO: task kworker/u16:0:9 blocked for more than 122 seconds. Not tainted 5.15.149-21875-gf795ebc40eb8 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u16:0 state:D stack: 0 pid: 9 ppid: 2 flags:0x00000008 Workqueue: events_unbound deferred_probe_work_func Call trace: __switch_to+0xf4/0x1f4 __schedule+0x418/0xb80 schedule+0x5c/0x10c schedule_preempt_disabled+0x2c/0x48 __mutex_lock+0x238/0x488 __mutex_lock_slowpath+0x1c/0x28 mutex_lock+0x50/0x74 clk_prepare_lock+0x7c/0x9c clk_core_prepare_lock+0x20/0x44 clk_prepare+0x24/0x30 clk_bulk_prepare+0x40/0xb0 mdss_runtime_resume+0x54/0x1c8 pm_generic_runtime_resume+0x30/0x44 __genpd_runtime_resume+0x68/0x7c genpd_runtime_resume+0x108/0x1f4 __rpm_callback+0x84/0x144 rpm_callback+0x30/0x88 rpm_resume+0x1f4/0x52c rpm_resume+0x178/0x52c __pm_runtime_resume+0x58/0x98 __device_attach+0xe0/0x170 device_initial_probe+0x1c/0x28 bus_probe_device+0x3c/0x9c device_add+0x644/0x814 mipi_dsi_device_register_full+0xe4/0x170 devm_mipi_dsi_device_register_full+0x28/0x70 ti_sn_bridge_probe+0x1dc/0x2c0 auxiliary_bus_probe+0x4c/0x94 really_probe+0xcc/0x2c8 __driver_probe_device+0xa8/0x130 driver_probe_device+0x48/0x110 __device_attach_driver+0xa4/0xcc bus_for_each_drv+0x8c/0xd8 __device_attach+0xf8/0x170 device_initial_probe+0x1c/0x28 bus_probe_device+0x3c/0x9c deferred_probe_work_func+0x9c/0xd8 process_one_work+0x148/0x518 worker_thread+0x138/0x350 kthread+0x138/0x1e0 ret_from_fork+0x10/0x20 The first thread is walking the clk tree and calling clk_pm_runtime_get() to power on devices required to read the clk hardware via struct clk_ops::is_enabled(). This thread holds the clk prepare_lock, and is trying to runtime PM resume a device, when it finds that the device is in the process of resuming so the thread schedule()s away waiting for the device to finish resuming before continuing. The second thread is runtime PM resuming the same device, but the runtime resume callback is calling clk_prepare(), trying to grab the prepare_lock waiting on the first thread. This is a classic ABBA deadlock. To properly fix the deadlock, we must never runtime PM resume or suspend a device with the clk prepare_lock held. Actually doing that is near impossible today because the global prepare_lock would have to be dropped in the middle of the tree, the device runtime PM resumed/suspended, and then the prepare_lock grabbed again to ensure consistency of the clk tree topology. If anything changes with the clk tree in the meantime, we've lost and will need to start the operation all over again. Luckily, most of the time we're simply incrementing or decrementing the runtime PM count on an active device, so we don't have the chance to schedule away with the prepare_lock held. Let's fix this immediate problem that can be triggered more easily by simply booting on Qualcomm sc7180. Introduce a list of clk_core structures that have been registered, or are in the process of being registered, that require runtime PM to operate. Iterate this list and call clk_pm_runtime_get() on each of them without holding the prepare_lock during clk_disable_unused(). This way we can be certain that the runtime PM state of the devices will be active and resumed so we can't schedule away while walking the clk tree with the prepare_lock held. Similarly, call clk_pm_runtime_put() without the prepare_lock held to properly drop the runtime PM reference. We remove the calls to clk_pm_runtime_{get,put}() in this path because they're superfluous now that we know the devices are runtime resumed. Reported-by: Douglas Anderson <[email protected]> Closes: https://lore.kernel.org/all/20220922084322.RFC.2.I375b6b9e0a0a5348962f004beb3dafee6a12dfbb@changeid/ [1] Closes: https://issuetracker.google.com/328070191 Cc: Marek Szyprowski <[email protected]> Cc: Ulf Hansson <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Fixes: 9a34b45 ("clk: Add support for runtime PM") Signed-off-by: Stephen Boyd <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Douglas Anderson <[email protected]>

Drop support for virtualizing adaptive PEBS, as KVM's implementation is architecturally broken without an obvious/easy path forward, and because exposing adaptive PEBS can leak host LBRs to the guest, i.e. can leak host kernel addresses to the guest. Bug #1 is that KVM doesn't account for the upper 32 bits of IA32_FIXED_CTR_CTRL when (re)programming fixed counters, e.g fixed_ctrl_field() drops the upper bits, reprogram_fixed_counters() stores local variables as u8s and truncates the upper bits too, etc. Bug #2 is that, because KVM _always_ sets precise_ip to a non-zero value for PEBS events, perf will _always_ generate an adaptive record, even if the guest requested a basic record. Note, KVM will also enable adaptive PEBS in individual *counter*, even if adaptive PEBS isn't exposed to the guest, but this is benign as MSR_PEBS_DATA_CFG is guaranteed to be zero, i.e. the guest will only ever see Basic records. Bug #3 is in perf. intel_pmu_disable_fixed() doesn't clear the upper bits either, i.e. leaves ICL_FIXED_0_ADAPTIVE set, and intel_pmu_enable_fixed() effectively doesn't clear ICL_FIXED_0_ADAPTIVE either. I.e. perf _always_ enables ADAPTIVE counters, regardless of what KVM requests. Bug #4 is that adaptive PEBS *might* effectively bypass event filters set by the host, as "Updated Memory Access Info Group" records information that might be disallowed by userspace via KVM_SET_PMU_EVENT_FILTER. Bug #5 is that KVM doesn't ensure LBR MSRs hold guest values (or at least zeros) when entering a vCPU with adaptive PEBS, which allows the guest to read host LBRs, i.e. host RIPs/addresses, by enabling "LBR Entries" records. Disable adaptive PEBS support as an immediate fix due to the severity of the LBR leak in particular, and because fixing all of the bugs will be non-trivial, e.g. not suitable for backporting to stable kernels. Note! This will break live migration, but trying to make KVM play nice with live migration would be quite complicated, wouldn't be guaranteed to work (i.e. KVM might still kill/confuse the guest), and it's not clear that there are any publicly available VMMs that support adaptive PEBS, let alone live migrate VMs that support adaptive PEBS, e.g. QEMU doesn't support PEBS in any capacity. Link: https://lore.kernel.org/all/[email protected] Link: https://lore.kernel.org/all/[email protected] Fixes: c59a1f1 ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") Cc: [email protected] Cc: Like Xu <[email protected]> Cc: Mingwei Zhang <[email protected]> Cc: Zhenyu Wang <[email protected]> Cc: Zhang Xiong <[email protected]> Cc: Lv Zhiyuan <[email protected]> Cc: Dapeng Mi <[email protected]> Cc: Jim Mattson <[email protected]> Acked-by: Like Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>

The uart_handle_cts_change() function in serial_core expects the caller to hold uport->lock. For example, I have seen the below kernel splat, when the Bluetooth driver is loaded on an i.MX28 board. [ 85.119255] ------------[ cut here ]------------ [ 85.124413] WARNING: CPU: 0 PID: 27 at /drivers/tty/serial/serial_core.c:3453 uart_handle_cts_change+0xb4/0xec [ 85.134694] Modules linked in: hci_uart bluetooth ecdh_generic ecc wlcore_sdio configfs [ 85.143314] CPU: 0 PID: 27 Comm: kworker/u3:0 Not tainted 6.6.3-00021-gd62a2f068f92 #1 [ 85.151396] Hardware name: Freescale MXS (Device Tree) [ 85.156679] Workqueue: hci0 hci_power_on [bluetooth] (...) [ 85.191765] uart_handle_cts_change from mxs_auart_irq_handle+0x380/0x3f4 [ 85.198787] mxs_auart_irq_handle from __handle_irq_event_percpu+0x88/0x210 (...) Cc: [email protected] Fixes: 4d90bb1 ("serial: core: Document and assert lock requirements for irq helpers") Reviewed-by: Frank Li <[email protected]> Signed-off-by: Emil Kronborg <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

…git/netfilter/nf netfilter pull request 24-04-11 Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: Patches #1 and #2 add missing rcu read side lock when iterating over expression and object type list which could race with module removal. Patch #3 prevents promisc packet from visiting the bridge/input hook to amend a recent fix to address conntrack confirmation race in br_netfilter and nf_conntrack_bridge. Patch #4 adds and uses iterate decorator type to fetch the current pipapo set backend datastructure view when netlink dumps the set elements. Patch #5 fixes removal of duplicate elements in the pipapo set backend. Patch #6 flowtable validates pppoe header before accessing it. Patch #7 fixes flowtable datapath for pppoe packets, otherwise lookup fails and pppoe packets follow classic path. ==================== Signed-off-by: David S. Miller <[email protected]>

When disabling aRFS under the `priv->state_lock`, any scheduled aRFS works are canceled using the `cancel_work_sync` function, which waits for the work to end if it has already started. However, while waiting for the work handler, the handler will try to acquire the `state_lock` which is already acquired. The worker acquires the lock to delete the rules if the state is down, which is not the worker's responsibility since disabling aRFS deletes the rules. Add an aRFS state variable, which indicates whether the aRFS is enabled and prevent adding rules when the aRFS is disabled. Kernel log: ====================================================== WARNING: possible circular locking dependency detected 6.7.0-rc4_net_next_mlx5_5483eb2 #1 Tainted: G I ------------------------------------------------------ ethtool/386089 is trying to acquire lock: ffff88810f21ce68 ((work_completion)(&rule->arfs_work)){+.+.}-{0:0}, at: __flush_work+0x74/0x4e0 but task is already holding lock: ffff8884a1808cc0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_ethtool_set_channels+0x53/0x200 [mlx5_core] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&priv->state_lock){+.+.}-{3:3}: __mutex_lock+0x80/0xc90 arfs_handle_work+0x4b/0x3b0 [mlx5_core] process_one_work+0x1dc/0x4a0 worker_thread+0x1bf/0x3c0 kthread+0xd7/0x100 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x11/0x20 -> #0 ((work_completion)(&rule->arfs_work)){+.+.}-{0:0}: __lock_acquire+0x17b4/0x2c80 lock_acquire+0xd0/0x2b0 __flush_work+0x7a/0x4e0 __cancel_work_timer+0x131/0x1c0 arfs_del_rules+0x143/0x1e0 [mlx5_core] mlx5e_arfs_disable+0x1b/0x30 [mlx5_core] mlx5e_ethtool_set_channels+0xcb/0x200 [mlx5_core] ethnl_set_channels+0x28f/0x3b0 ethnl_default_set_doit+0xec/0x240 genl_family_rcv_msg_doit+0xd0/0x120 genl_rcv_msg+0x188/0x2c0 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x1a1/0x270 netlink_sendmsg+0x214/0x460 __sock_sendmsg+0x38/0x60 __sys_sendto+0x113/0x170 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x40/0xe0 entry_SYSCALL_64_after_hwframe+0x46/0x4e other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&priv->state_lock); lock((work_completion)(&rule->arfs_work)); lock(&priv->state_lock); lock((work_completion)(&rule->arfs_work)); *** DEADLOCK *** 3 locks held by ethtool/386089: #0: ffffffff82ea7210 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 #1: ffffffff82e94c88 (rtnl_mutex){+.+.}-{3:3}, at: ethnl_default_set_doit+0xd3/0x240 #2: ffff8884a1808cc0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_ethtool_set_channels+0x53/0x200 [mlx5_core] stack backtrace: CPU: 15 PID: 386089 Comm: ethtool Tainted: G I 6.7.0-rc4_net_next_mlx5_5483eb2 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x60/0xa0 check_noncircular+0x144/0x160 __lock_acquire+0x17b4/0x2c80 lock_acquire+0xd0/0x2b0 ? __flush_work+0x74/0x4e0 ? save_trace+0x3e/0x360 ? __flush_work+0x74/0x4e0 __flush_work+0x7a/0x4e0 ? __flush_work+0x74/0x4e0 ? __lock_acquire+0xa78/0x2c80 ? lock_acquire+0xd0/0x2b0 ? mark_held_locks+0x49/0x70 __cancel_work_timer+0x131/0x1c0 ? mark_held_locks+0x49/0x70 arfs_del_rules+0x143/0x1e0 [mlx5_core] mlx5e_arfs_disable+0x1b/0x30 [mlx5_core] mlx5e_ethtool_set_channels+0xcb/0x200 [mlx5_core] ethnl_set_channels+0x28f/0x3b0 ethnl_default_set_doit+0xec/0x240 genl_family_rcv_msg_doit+0xd0/0x120 genl_rcv_msg+0x188/0x2c0 ? ethnl_ops_begin+0xb0/0xb0 ? genl_family_rcv_msg_dumpit+0xf0/0xf0 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x1a1/0x270 netlink_sendmsg+0x214/0x460 __sock_sendmsg+0x38/0x60 __sys_sendto+0x113/0x170 ? do_user_addr_fault+0x53f/0x8f0 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x40/0xe0 entry_SYSCALL_64_after_hwframe+0x46/0x4e </TASK> Fixes: 45bf454 ("net/mlx5e: Enabling aRFS mechanism") Signed-off-by: Carolina Jubran <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Running a lot of VK CTS in parallel against nouveau, once every few hours you might see something like this crash. BUG: kernel NULL pointer dereference, address: 0000000000000008 PGD 8000000114e6e067 P4D 8000000114e6e067 PUD 109046067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 7 PID: 53891 Comm: deqp-vk Not tainted 6.8.0-rc6+ torvalds#27 Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021 RIP: 0010:gp100_vmm_pgt_mem+0xe3/0x180 [nouveau] Code: c7 48 01 c8 49 89 45 58 85 d2 0f 84 95 00 00 00 41 0f b7 46 12 49 8b 7e 08 89 da 42 8d 2c f8 48 8b 47 08 41 83 c7 01 48 89 ee <48> 8b 40 08 ff d0 0f 1f 00 49 8b 7e 08 48 89 d9 48 8d 75 04 48 c1 RSP: 0000:ffffac20c5857838 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00000000004d8001 RCX: 0000000000000001 RDX: 00000000004d8001 RSI: 00000000000006d8 RDI: ffffa07afe332180 RBP: 00000000000006d8 R08: ffffac20c5857ad0 R09: 0000000000ffff10 R10: 0000000000000001 R11: ffffa07af27e2de0 R12: 000000000000001c R13: ffffac20c5857ad0 R14: ffffa07a96fe9040 R15: 000000000000001c FS: 00007fe395eed7c0(0000) GS:ffffa07e2c980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000011febe001 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ... ? gp100_vmm_pgt_mem+0xe3/0x180 [nouveau] ? gp100_vmm_pgt_mem+0x37/0x180 [nouveau] nvkm_vmm_iter+0x351/0xa20 [nouveau] ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau] ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau] ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau] ? __lock_acquire+0x3ed/0x2170 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau] nvkm_vmm_ptes_get_map+0xc2/0x100 [nouveau] ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau] ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau] nvkm_vmm_map_locked+0x224/0x3a0 [nouveau] Adding any sort of useful debug usually makes it go away, so I hand wrote the function in a line, and debugged the asm. Every so often pt->memory->ptrs is NULL. This ptrs ptr is set in the nv50_instobj_acquire called from nvkm_kmap. If Thread A and Thread B both get to nv50_instobj_acquire around the same time, and Thread A hits the refcount_set line, and in lockstep thread B succeeds at refcount_inc_not_zero, there is a chance the ptrs value won't have been stored since refcount_set is unordered. Force a memory barrier here, I picked smp_mb, since we want it on all CPUs and it's write followed by a read. v2: use paired smp_rmb/smp_wmb. Cc: <[email protected]> Fixes: be55287 ("drm/nouveau/imem/nv50: embed nvkm_instobj directly into nv04_instobj") Signed-off-by: Dave Airlie <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Currently normal HugeTLB fault ends up crashing the kernel, as p4dp derived from p4d_offset() is an invalid address when PGTABLE_LEVEL = 5. A p4d level entry needs to be allocated when not available while walking the page table during HugeTLB faults. Let's call p4d_alloc() to allocate such entries when required instead of current p4d_offset(). Unable to handle kernel paging request at virtual address ffffffff80000000 Mem abort info: ESR = 0x0000000096000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000081da9000 [ffffffff80000000] pgd=1000000082cec003, p4d=0000000082c32003, pud=0000000000000000 Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 108 Comm: high_addr_hugep Not tainted 6.9.0-rc4 torvalds#48 Hardware name: Foundation-v8A (DT) pstate: 01402005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : huge_pte_alloc+0xd4/0x334 lr : hugetlb_fault+0x1b8/0xc68 sp : ffff8000833bbc20 x29: ffff8000833bbc20 x28: fff000080080cb58 x27: ffff800082a7cc58 x26: 0000000000000000 x25: fff0000800378e40 x24: fff00008008d6c60 x23: 00000000de9dbf07 x22: fff0000800378e40 x21: 0004000000000000 x20: 0004000000000000 x19: ffffffff80000000 x18: 1ffe00010011d7a1 x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000001 x14: 0000000000000000 x13: ffff8000816120d0 x12: ffffffffffffffff x11: 0000000000000000 x10: fff00008008ebd0c x9 : 0004000000000000 x8 : 0000000000001255 x7 : fff00008003e2000 x6 : 00000000061d54b0 x5 : 0000000000001000 x4 : ffffffff80000000 x3 : 0000000000200000 x2 : 0000000000000004 x1 : 0000000080000000 x0 : 0000000000000000 Call trace: huge_pte_alloc+0xd4/0x334 hugetlb_fault+0x1b8/0xc68 handle_mm_fault+0x260/0x29c do_page_fault+0xfc/0x47c do_translation_fault+0x68/0x74 do_mem_abort+0x44/0x94 el0_da+0x2c/0x9c el0t_64_sync_handler+0x70/0xc4 el0t_64_sync+0x190/0x194 Code: aa000084 cb010084 b24c2c84 8b130c93 (f9400260) ---[ end trace 0000000000000000 ]--- Cc: Will Deacon <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Mark Rutland <[email protected]> Cc: [email protected] Cc: [email protected] Fixes: a6bbf5d ("arm64: mm: Add definitions to support 5 levels of paging") Reported-by: Dev Jain <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Acked-by: Mark Rutland <[email protected]> Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Ryan Roberts <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>

On arm64, UBSAN traps can be decoded from the trap instruction. Add the add, sub, and mul overflow trap codes now that CONFIG_UBSAN_SIGNED_WRAP exists. Seen under clang 19: Internal error: UBSAN: unrecognized failure code: 00000000f2005515 [#1] PREEMPT SMP Reported-by: Nathan Chancellor <[email protected]> Closes: https://lore.kernel.org/lkml/20240411-fix-ubsan-in-hardening-config-v1-0-e0177c80ffaa@kernel.org Fixes: 557f8c5 ("ubsan: Reintroduce signed overflow sanitizer") Tested-by: Nathan Chancellor <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>

When I did hard offline test with hugetlb pages, below deadlock occurs: ====================================================== WARNING: possible circular locking dependency detected 6.8.0-11409-gf6cef5f8c37f #1 Not tainted ------------------------------------------------------ bash/46904 is trying to acquire lock: ffffffffabe68910 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_slow_dec+0x16/0x60 but task is already holding lock: ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (pcp_batch_high_lock){+.+.}-{3:3}: __mutex_lock+0x6c/0x770 page_alloc_cpu_online+0x3c/0x70 cpuhp_invoke_callback+0x397/0x5f0 __cpuhp_invoke_callback_range+0x71/0xe0 _cpu_up+0xeb/0x210 cpu_up+0x91/0xe0 cpuhp_bringup_mask+0x49/0xb0 bringup_nonboot_cpus+0xb7/0xe0 smp_init+0x25/0xa0 kernel_init_freeable+0x15f/0x3e0 kernel_init+0x15/0x1b0 ret_from_fork+0x2f/0x50 ret_from_fork_asm+0x1a/0x30 -> #0 (cpu_hotplug_lock){++++}-{0:0}: __lock_acquire+0x1298/0x1cd0 lock_acquire+0xc0/0x2b0 cpus_read_lock+0x2a/0xc0 static_key_slow_dec+0x16/0x60 __hugetlb_vmemmap_restore_folio+0x1b9/0x200 dissolve_free_huge_page+0x211/0x260 __page_handle_poison+0x45/0xc0 memory_failure+0x65e/0xc70 hard_offline_page_store+0x55/0xa0 kernfs_fop_write_iter+0x12c/0x1d0 vfs_write+0x387/0x550 ksys_write+0x64/0xe0 do_syscall_64+0xca/0x1e0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(pcp_batch_high_lock); lock(cpu_hotplug_lock); lock(pcp_batch_high_lock); rlock(cpu_hotplug_lock); *** DEADLOCK *** 5 locks held by bash/46904: #0: ffff98f6c3bb23f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0 #1: ffff98f6c328e488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0 #2: ffff98ef83b31890 (kn->active#113){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0 #3: ffffffffabf9db48 (mf_mutex){+.+.}-{3:3}, at: memory_failure+0x44/0xc70 #4: ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40 stack backtrace: CPU: 10 PID: 46904 Comm: bash Kdump: loaded Not tainted 6.8.0-11409-gf6cef5f8c37f #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x68/0xa0 check_noncircular+0x129/0x140 __lock_acquire+0x1298/0x1cd0 lock_acquire+0xc0/0x2b0 cpus_read_lock+0x2a/0xc0 static_key_slow_dec+0x16/0x60 __hugetlb_vmemmap_restore_folio+0x1b9/0x200 dissolve_free_huge_page+0x211/0x260 __page_handle_poison+0x45/0xc0 memory_failure+0x65e/0xc70 hard_offline_page_store+0x55/0xa0 kernfs_fop_write_iter+0x12c/0x1d0 vfs_write+0x387/0x550 ksys_write+0x64/0xe0 do_syscall_64+0xca/0x1e0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 RIP: 0033:0x7fc862314887 Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007fff19311268 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fc862314887 RDX: 000000000000000c RSI: 000056405645fe10 RDI: 0000000000000001 RBP: 000056405645fe10 R08: 00007fc8623d1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c R13: 00007fc86241b780 R14: 00007fc862417600 R15: 00007fc862416a00 In short, below scene breaks the lock dependency chain: memory_failure __page_handle_poison zone_pcp_disable -- lock(pcp_batch_high_lock) dissolve_free_huge_page __hugetlb_vmemmap_restore_folio static_key_slow_dec cpus_read_lock -- rlock(cpu_hotplug_lock) Fix this by calling drain_all_pages() instead. This issue won't occur until commit a6b4085 ("mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key"). As it introduced rlock(cpu_hotplug_lock) in dissolve_free_huge_page() code path while lock(pcp_batch_high_lock) is already in the __page_handle_poison(). [[email protected]: extend comment per Oscar] [[email protected]: reflow block comment] Link: https://lkml.kernel.org/r/[email protected] Fixes: a6b4085 ("mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key") Signed-off-by: Miaohe Lin <[email protected]> Acked-by: Oscar Salvador <[email protected]> Reviewed-by: Jane Chu <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e #3 [fffffe00003fced0] do_nmi at ffffffff8922660d #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 #12 [ffffa65531497b68] printk at ffffffff89318306 torvalds#13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 torvalds#14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] torvalds#15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] torvalds#16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] torvalds#17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] torvalds#18 [ffffa65531497f10] kthread at ffffffff892d2e72 torvalds#19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Acked-by: Jason Wang <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

…git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: Patch #1 amends a missing spot where the set iterator type is unset. This is fixing a issue in the previous pull request. Patch #2 fixes the delete set command abort path by restoring state of the elements. Reverse logic for the activate (abort) case otherwise element state is not restored, this requires to move the check for active/inactive elements to the set iterator callback. From the deactivate path, toggle the next generation bit and from the activate (abort) path, clear the next generation bitmask. Patch #3 skips elements already restored by delete set command from the abort path in case there is a previous delete element command in the batch. Check for the next generation bit just like it is done via set iteration to restore maps. netfilter pull request 24-04-18 * tag 'nf-24-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: fix memleak in map from abort path netfilter: nf_tables: restore set elements when delete set fails netfilter: nf_tables: missing iterator type in lookup walk ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

On arm64 machines, swsusp_save() faults if it attempts to access MEMBLOCK_NOMAP memory ranges. This can be reproduced in QEMU using UEFI when booting with rodata=off debug_pagealloc=off and CONFIG_KFENCE=n: Unable to handle kernel paging request at virtual address ffffff8000000000 Mem abort info: ESR = 0x0000000096000007 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x07: level 3 translation fault Data abort info: ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000eeb0b000 [ffffff8000000000] pgd=180000217fff9803, p4d=180000217fff9803, pud=180000217fff9803, pmd=180000217fff8803, pte=0000000000000000 Internal error: Oops: 0000000096000007 [#1] SMP Internal error: Oops: 0000000096000007 [#1] SMP Modules linked in: xt_multiport ipt_REJECT nf_reject_ipv4 xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter bpfilter rfkill at803x snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg dwmac_generic stmmac_platform snd_hda_codec stmmac joydev pcs_xpcs snd_hda_core phylink ppdev lp parport ramoops reed_solomon ip_tables x_tables nls_iso8859_1 vfat multipath linear amdgpu amdxcp drm_exec gpu_sched drm_buddy hid_generic usbhid hid radeon video drm_suballoc_helper drm_ttm_helper ttm i2c_algo_bit drm_display_helper cec drm_kms_helper drm CPU: 0 PID: 3663 Comm: systemd-sleep Not tainted 6.6.2+ torvalds#76 Source Version: 4e22ed63a0a48e7a7cff9b98b7806d8d4add7dc0 Hardware name: Greatwall GW-XXXXXX-XXX/GW-XXXXXX-XXX, BIOS KunLun BIOS V4.0 01/19/2021 pstate: 600003c5 (nZCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : swsusp_save+0x280/0x538 lr : swsusp_save+0x280/0x538 sp : ffffffa034a3fa40 x29: ffffffa034a3fa40 x28: ffffff8000001000 x27: 0000000000000000 x26: ffffff8001400000 x25: ffffffc08113e248 x24: 0000000000000000 x23: 0000000000080000 x22: ffffffc08113e280 x21: 00000000000c69f2 x20: ffffff8000000000 x19: ffffffc081ae2500 x18: 0000000000000000 x17: 6666662074736420 x16: 3030303030303030 x15: 3038666666666666 x14: 0000000000000b69 x13: ffffff9f89088530 x12: 00000000ffffffea x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffffffc08193f0d0 x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 0000000000000001 x5 : ffffffa0fff09dc8 x4 : 0000000000000000 x3 : 0000000000000027 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000004e Call trace: swsusp_save+0x280/0x538 swsusp_arch_suspend+0x148/0x190 hibernation_snapshot+0x240/0x39c hibernate+0xc4/0x378 state_store+0xf0/0x10c kobj_attr_store+0x14/0x24 The reason is swsusp_save() -> copy_data_pages() -> page_is_saveable() -> kernel_page_present() assuming that a page is always present when can_set_direct_map() is false (all of rodata_full, debug_pagealloc_enabled() and arm64_kfence_can_set_direct_map() false), irrespective of the MEMBLOCK_NOMAP ranges. Such MEMBLOCK_NOMAP regions should not be saved during hibernation. This problem was introduced by changes to the pfn_valid() logic in commit a7d9f30 ("arm64: drop pfn_valid_within() and simplify pfn_valid()"). Similar to other architectures, drop the !can_set_direct_map() check in kernel_page_present() so that page_is_savable() skips such pages. Fixes: a7d9f30 ("arm64: drop pfn_valid_within() and simplify pfn_valid()") Cc: <[email protected]> # 5.14.x Suggested-by: Mike Rapoport <[email protected]> Suggested-by: Catalin Marinas <[email protected]> Co-developed-by: xiongxin <[email protected]> Signed-off-by: xiongxin <[email protected]> Signed-off-by: Yaxiong Tian <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Link: https://lore.kernel.org/r/[email protected] [[email protected]: rework commit message] Signed-off-by: Catalin Marinas <[email protected]>

Petr Machata says: ==================== mlxsw: Fixes This patchset fixes the following issues: - During driver de-initialization the driver unregisters the EMAD response trap by setting its action to DISCARD. However the manual only permits TRAP and FORWARD, and future firmware versions will enforce this. In patch #1, suppress the error message by aligning the driver to the manual and use a FORWARD (NOP) action when unregistering the trap. - The driver queries the Management Capabilities Mask (MCAM) register during initialization to understand if certain features are supported. However, not all firmware versions support this register, leading to the driver failing to load. Patches #2 and #3 fix this issue by treating an error in the register query as an indication that the feature is not supported. v2: - Patch #2: - Make mlxsw_env_max_module_eeprom_len_query() void ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

syzbot was able to trigger a NULL deref in fib_validate_source() in an old tree [1]. It appears the bug exists in latest trees. All calls to __in_dev_get_rcu() must be checked for a NULL result. [1] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 2 PID: 3257 Comm: syz-executor.3 Not tainted 5.10.0-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:fib_validate_source+0xbf/0x15a0 net/ipv4/fib_frontend.c:425 Code: 18 f2 f2 f2 f2 42 c7 44 20 23 f3 f3 f3 f3 48 89 44 24 78 42 c6 44 20 27 f3 e8 5d 88 48 fc 4c 89 e8 48 c1 e8 03 48 89 44 24 18 <42> 80 3c 20 00 74 08 4c 89 ef e8 d2 15 98 fc 48 89 5c 24 10 41 bf RSP: 0018:ffffc900015fee40 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88800f7a4000 RCX: ffff88800f4f90c0 RDX: 0000000000000000 RSI: 0000000004001eac RDI: ffff8880160c64c0 RBP: ffffc900015ff060 R08: 0000000000000000 R09: ffff88800f7a4000 R10: 0000000000000002 R11: ffff88800f4f90c0 R12: dffffc0000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88800f7a4000 FS: 00007f938acfe6c0(0000) GS:ffff888058c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f938acddd58 CR3: 000000001248e000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip_route_use_hint+0x410/0x9b0 net/ipv4/route.c:2231 ip_rcv_finish_core+0x2c4/0x1a30 net/ipv4/ip_input.c:327 ip_list_rcv_finish net/ipv4/ip_input.c:612 [inline] ip_sublist_rcv+0x3ed/0xe50 net/ipv4/ip_input.c:638 ip_list_rcv+0x422/0x470 net/ipv4/ip_input.c:673 __netif_receive_skb_list_ptype net/core/dev.c:5572 [inline] __netif_receive_skb_list_core+0x6b1/0x890 net/core/dev.c:5620 __netif_receive_skb_list net/core/dev.c:5672 [inline] netif_receive_skb_list_internal+0x9f9/0xdc0 net/core/dev.c:5764 netif_receive_skb_list+0x55/0x3e0 net/core/dev.c:5816 xdp_recv_frames net/bpf/test_run.c:257 [inline] xdp_test_run_batch net/bpf/test_run.c:335 [inline] bpf_test_run_xdp_live+0x1818/0x1d00 net/bpf/test_run.c:363 bpf_prog_test_run_xdp+0x81f/0x1170 net/bpf/test_run.c:1376 bpf_prog_test_run+0x349/0x3c0 kernel/bpf/syscall.c:3736 __sys_bpf+0x45c/0x710 kernel/bpf/syscall.c:5115 __do_sys_bpf kernel/bpf/syscall.c:5201 [inline] __se_sys_bpf kernel/bpf/syscall.c:5199 [inline] __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5199 Fixes: 02b2494 ("ipv4: use dst hint for ipv4 list receive") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Paolo Abeni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

If stack_depot_save_flags() allocates memory it always drops __GFP_NOLOCKDEP flag. So when KASAN tries to track __GFP_NOLOCKDEP allocation we may end up with lockdep splat like bellow: ====================================================== WARNING: possible circular locking dependency detected 6.9.0-rc3+ torvalds#49 Not tainted ------------------------------------------------------ kswapd0/149 is trying to acquire lock: ffff88811346a920 (&xfs_nondir_ilock_class){++++}-{4:4}, at: xfs_reclaim_inode+0x3ac/0x590 [xfs] but task is already holding lock: ffffffff8bb33100 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x5d9/0xad0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: __lock_acquire+0x7da/0x1030 lock_acquire+0x15d/0x400 fs_reclaim_acquire+0xb5/0x100 prepare_alloc_pages.constprop.0+0xc5/0x230 __alloc_pages+0x12a/0x3f0 alloc_pages_mpol+0x175/0x340 stack_depot_save_flags+0x4c5/0x510 kasan_save_stack+0x30/0x40 kasan_save_track+0x10/0x30 __kasan_slab_alloc+0x83/0x90 kmem_cache_alloc+0x15e/0x4a0 __alloc_object+0x35/0x370 __create_object+0x22/0x90 __kmalloc_node_track_caller+0x477/0x5b0 krealloc+0x5f/0x110 xfs_iext_insert_raw+0x4b2/0x6e0 [xfs] xfs_iext_insert+0x2e/0x130 [xfs] xfs_iread_bmbt_block+0x1a9/0x4d0 [xfs] xfs_btree_visit_block+0xfb/0x290 [xfs] xfs_btree_visit_blocks+0x215/0x2c0 [xfs] xfs_iread_extents+0x1a2/0x2e0 [xfs] xfs_buffered_write_iomap_begin+0x376/0x10a0 [xfs] iomap_iter+0x1d1/0x2d0 iomap_file_buffered_write+0x120/0x1a0 xfs_file_buffered_write+0x128/0x4b0 [xfs] vfs_write+0x675/0x890 ksys_write+0xc3/0x160 do_syscall_64+0x94/0x170 entry_SYSCALL_64_after_hwframe+0x71/0x79 Always preserve __GFP_NOLOCKDEP to fix this. Link: https://lkml.kernel.org/r/[email protected] Fixes: cd11016 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") Signed-off-by: Andrey Ryabinin <[email protected]> Reported-by: Xiubo Li <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Reported-by: Damien Le Moal <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Suggested-by: Dave Chinner <[email protected]> Tested-by: Xiubo Li <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

Issue reported by customer during SRIOV testing, call trace: When both i40e and the i40iw driver are loaded, a warning in check_flush_dependency is being triggered. This seems to be because of the i40e driver workqueue is allocated with the WQ_MEM_RECLAIM flag, and the i40iw one is not. Similar error was encountered on ice too and it was fixed by removing the flag. Do the same for i40e too. [Feb 9 09:08] ------------[ cut here ]------------ [ +0.000004] workqueue: WQ_MEM_RECLAIM i40e:i40e_service_task [i40e] is flushing !WQ_MEM_RECLAIM infiniband:0x0 [ +0.000060] WARNING: CPU: 0 PID: 937 at kernel/workqueue.c:2966 check_flush_dependency+0x10b/0x120 [ +0.000007] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_timer snd_seq_device snd soundcore nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm cifs_md4 dns_resolver netfs qrtr rfkill sunrpc vfat fat intel_rapl_msr intel_rapl_common irdma intel_uncore_frequency intel_uncore_frequency_common ice ipmi_ssif isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp gnss coretemp ib_uverbs rapl intel_cstate ib_core iTCO_wdt iTCO_vendor_support acpi_ipmi mei_me ipmi_si intel_uncore ioatdma i2c_i801 joydev pcspkr mei ipmi_devintf lpc_ich intel_pch_thermal i2c_smbus ipmi_msghandler acpi_power_meter acpi_pad xfs libcrc32c ast sd_mod drm_shmem_helper t10_pi drm_kms_helper sg ixgbe drm i40e ahci crct10dif_pclmul libahci crc32_pclmul igb crc32c_intel libata ghash_clmulni_intel i2c_algo_bit mdio dca wmi dm_mirror dm_region_hash dm_log dm_mod fuse [ +0.000050] CPU: 0 PID: 937 Comm: kworker/0:3 Kdump: loaded Not tainted 6.8.0-rc2-Feb-net_dev-Qiueue-00279-gbd43c5687e05 #1 [ +0.000003] Hardware name: Intel Corporation S2600BPB/S2600BPB, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020 [ +0.000001] Workqueue: i40e i40e_service_task [i40e] [ +0.000024] RIP: 0010:check_flush_dependency+0x10b/0x120 [ +0.000003] Code: ff 49 8b 54 24 18 48 8d 8b b0 00 00 00 49 89 e8 48 81 c6 b0 00 00 00 48 c7 c7 b0 97 fa 9f c6 05 8a cc 1f 02 01 e8 35 b3 fd ff <0f> 0b e9 10 ff ff ff 80 3d 78 cc 1f 02 00 75 94 e9 46 ff ff ff 90 [ +0.000002] RSP: 0018:ffffbd294976bcf8 EFLAGS: 00010282 [ +0.000002] RAX: 0000000000000000 RBX: ffff94d4c483c000 RCX: 0000000000000027 [ +0.000001] RDX: ffff94d47f620bc8 RSI: 0000000000000001 RDI: ffff94d47f620bc0 [ +0.000001] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffff7fff [ +0.000001] R10: ffffbd294976bb98 R11: ffffffffa0be65e8 R12: ffff94c5451ea180 [ +0.000001] R13: ffff94c5ab5e8000 R14: ffff94c5c20b6e05 R15: ffff94c5f1330ab0 [ +0.000001] FS: 0000000000000000(0000) GS:ffff94d47f600000(0000) knlGS:0000000000000000 [ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000001] CR2: 00007f9e6f1fca70 CR3: 0000000038e20004 CR4: 00000000007706f0 [ +0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.000001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ +0.000001] PKRU: 55555554 [ +0.000001] Call Trace: [ +0.000001] <TASK> [ +0.000002] ? __warn+0x80/0x130 [ +0.000003] ? check_flush_dependency+0x10b/0x120 [ +0.000002] ? report_bug+0x195/0x1a0 [ +0.000005] ? handle_bug+0x3c/0x70 [ +0.000003] ? exc_invalid_op+0x14/0x70 [ +0.000002] ? asm_exc_invalid_op+0x16/0x20 [ +0.000006] ? check_flush_dependency+0x10b/0x120 [ +0.000002] ? check_flush_dependency+0x10b/0x120 [ +0.000002] __flush_workqueue+0x126/0x3f0 [ +0.000015] ib_cache_cleanup_one+0x1c/0xe0 [ib_core] [ +0.000056] __ib_unregister_device+0x6a/0xb0 [ib_core] [ +0.000023] ib_unregister_device_and_put+0x34/0x50 [ib_core] [ +0.000020] i40iw_close+0x4b/0x90 [irdma] [ +0.000022] i40e_notify_client_of_netdev_close+0x54/0xc0 [i40e] [ +0.000035] i40e_service_task+0x126/0x190 [i40e] [ +0.000024] process_one_work+0x174/0x340 [ +0.000003] worker_thread+0x27e/0x390 [ +0.000001] ? __pfx_worker_thread+0x10/0x10 [ +0.000002] kthread+0xdf/0x110 [ +0.000002] ? __pfx_kthread+0x10/0x10 [ +0.000002] ret_from_fork+0x2d/0x50 [ +0.000003] ? __pfx_kthread+0x10/0x10 [ +0.000001] ret_from_fork_asm+0x1b/0x30 [ +0.000004] </TASK> [ +0.000001] ---[ end trace 0000000000000000 ]--- Fixes: 4d5957c ("i40e: remove WQ_UNBOUND and the task limit of our workqueue") Signed-off-by: Sindhu Devale <[email protected]> Reviewed-by: Arkadiusz Kubalewski <[email protected]> Reviewed-by: Mateusz Polchlopek <[email protected]> Signed-off-by: Aleksandr Loktionov <[email protected]> Tested-by: Robert Ganzynkowicz <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

9f74a3d ("ice: Fix VF Reset paths when interface in a failed over aggregate"), the ice driver has acquired the LAG mutex in ice_reset_vf(). The commit placed this lock acquisition just prior to the acquisition of the VF configuration lock. If ice_reset_vf() acquires the configuration lock via the ICE_VF_RESET_LOCK flag, this could deadlock with ice_vc_cfg_qs_msg() because it always acquires the locks in the order of the VF configuration lock and then the LAG mutex. Lockdep reports this violation almost immediately on creating and then removing 2 VF: ====================================================== WARNING: possible circular locking dependency detected 6.8.0-rc6 torvalds#54 Tainted: G W O ------------------------------------------------------ kworker/60:3/6771 is trying to acquire lock: ff40d43e099380a0 (&vf->cfg_lock){+.+.}-{3:3}, at: ice_reset_vf+0x22f/0x4d0 [ice] but task is already holding lock: ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&pf->lag_mutex){+.+.}-{3:3}: __lock_acquire+0x4f8/0xb40 lock_acquire+0xd4/0x2d0 __mutex_lock+0x9b/0xbf0 ice_vc_cfg_qs_msg+0x45/0x690 [ice] ice_vc_process_vf_msg+0x4f5/0x870 [ice] __ice_clean_ctrlq+0x2b5/0x600 [ice] ice_service_task+0x2c9/0x480 [ice] process_one_work+0x1e9/0x4d0 worker_thread+0x1e1/0x3d0 kthread+0x104/0x140 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x1b/0x30 -> #0 (&vf->cfg_lock){+.+.}-{3:3}: check_prev_add+0xe2/0xc50 validate_chain+0x558/0x800 __lock_acquire+0x4f8/0xb40 lock_acquire+0xd4/0x2d0 __mutex_lock+0x9b/0xbf0 ice_reset_vf+0x22f/0x4d0 [ice] ice_process_vflr_event+0x98/0xd0 [ice] ice_service_task+0x1cc/0x480 [ice] process_one_work+0x1e9/0x4d0 worker_thread+0x1e1/0x3d0 kthread+0x104/0x140 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x1b/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&pf->lag_mutex); lock(&vf->cfg_lock); lock(&pf->lag_mutex); lock(&vf->cfg_lock); *** DEADLOCK *** 4 locks held by kworker/60:3/6771: #0: ff40d43e05428b38 ((wq_completion)ice){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0 #1: ff50d06e05197e58 ((work_completion)(&pf->serv_task)){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0 #2: ff40d43ea1960e50 (&pf->vfs.table_lock){+.+.}-{3:3}, at: ice_process_vflr_event+0x48/0xd0 [ice] #3: ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice] stack backtrace: CPU: 60 PID: 6771 Comm: kworker/60:3 Tainted: G W O 6.8.0-rc6 torvalds#54 Hardware name: Workqueue: ice ice_service_task [ice] Call Trace: <TASK> dump_stack_lvl+0x4a/0x80 check_noncircular+0x12d/0x150 check_prev_add+0xe2/0xc50 ? save_trace+0x59/0x230 ? add_chain_cache+0x109/0x450 validate_chain+0x558/0x800 __lock_acquire+0x4f8/0xb40 ? lockdep_hardirqs_on+0x7d/0x100 lock_acquire+0xd4/0x2d0 ? ice_reset_vf+0x22f/0x4d0 [ice] ? lock_is_held_type+0xc7/0x120 __mutex_lock+0x9b/0xbf0 ? ice_reset_vf+0x22f/0x4d0 [ice] ? ice_reset_vf+0x22f/0x4d0 [ice] ? rcu_is_watching+0x11/0x50 ? ice_reset_vf+0x22f/0x4d0 [ice] ice_reset_vf+0x22f/0x4d0 [ice] ? process_one_work+0x176/0x4d0 ice_process_vflr_event+0x98/0xd0 [ice] ice_service_task+0x1cc/0x480 [ice] process_one_work+0x1e9/0x4d0 worker_thread+0x1e1/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0x104/0x140 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> To avoid deadlock, we must acquire the LAG mutex only after acquiring the VF configuration lock. Fix the ice_reset_vf() to acquire the LAG mutex only after we either acquire or check that the VF configuration lock is held. Fixes: 9f74a3d ("ice: Fix VF Reset paths when interface in a failed over aggregate") Signed-off-by: Jacob Keller <[email protected]> Reviewed-by: Dave Ertman <[email protected]> Reviewed-by: Mateusz Polchlopek <[email protected]> Tested-by: Przemek Kitszel <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

…nix_gc(). syzbot reported a lockdep splat regarding unix_gc_lock and unix_state_lock(). One is called from recvmsg() for a connected socket, and another is called from GC for TCP_LISTEN socket. So, the splat is false-positive. Let's add a dedicated lock class for the latter to suppress the splat. Note that this change is not necessary for net-next.git as the issue is only applied to the old GC impl. [0]: WARNING: possible circular locking dependency detected 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Not tainted ----------------------------------------------------- kworker/u8:1/11 is trying to acquire lock: ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 but task is already holding lock: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (unix_gc_lock){+.+.}-{2:2}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] unix_notinflight+0x13d/0x390 net/unix/garbage.c:140 unix_detach_fds net/unix/af_unix.c:1819 [inline] unix_destruct_scm+0x221/0x350 net/unix/af_unix.c:1876 skb_release_head_state+0x100/0x250 net/core/skbuff.c:1188 skb_release_all net/core/skbuff.c:1200 [inline] __kfree_skb net/core/skbuff.c:1216 [inline] kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1252 kfree_skb include/linux/skbuff.h:1262 [inline] manage_oob net/unix/af_unix.c:2672 [inline] unix_stream_read_generic+0x1125/0x2700 net/unix/af_unix.c:2749 unix_stream_splice_read+0x239/0x320 net/unix/af_unix.c:2981 do_splice_read fs/splice.c:985 [inline] splice_file_to_pipe+0x299/0x500 fs/splice.c:1295 do_splice+0xf2d/0x1880 fs/splice.c:1379 __do_splice fs/splice.c:1436 [inline] __do_sys_splice fs/splice.c:1652 [inline] __se_sys_splice+0x331/0x4a0 fs/splice.c:1634 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&u->lock){+.+.}-{2:2}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(unix_gc_lock); lock(&u->lock); lock(unix_gc_lock); lock(&u->lock); *** DEADLOCK *** 3 locks held by kworker/u8:1/11: #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline] #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x17c0 kernel/workqueue.c:3335 #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline] #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x17c0 kernel/workqueue.c:3335 #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261 stack backtrace: CPU: 0 PID: 11 Comm: kworker/u8:1 Not tainted 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Workqueue: events_unbound __unix_gc Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187 check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Fixes: 47d8ac0 ("af_unix: Fix garbage collector racing against connect()") Reported-and-tested-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=fa379358c28cc87cc307 Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

…git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains two Netfilter/IPVS fixes for net: Patch #1 fixes SCTP checksumming for IPVS with gso packets, from Ismael Luceno. Patch #2 honor dormant flag from netdev event path to fix a possible double hook unregistration. * tag 'nf-24-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: honor table dormant flag from netdev release event path ipvs: Fix checksumming on GSO of SCTP packets ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

LoongArch inclusion category: feature -------------------------------- The old memory should be reserved after efi_runtime_init() to avoid destroying the EFI space and causing failure when executing svam(). Fix the following problems when executing kdump: [ 0.000000] The BIOS Version: Loongson-UDK2018-V2.0.04082-beta7 [ 0.000000] CPU 0 Unable to handle kernel paging request at virtual address 00000000fdeb0e7c, era == 00000000fdeb0e7c, ra == 90000000dae6585c [ 0.000000] Oops[#1]: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.137+ torvalds#86 [ 0.000000] Hardware name: Loongson Loongson-3A5000-7A1000-1w-A2101/Loongson-LS3A5000-7A1000-1w-A2101, BIOS vUDK2018-LoongArch-V2.0.pre-beta8 06/15/2022 [ 0.000000] $ 0 : 0000000000000000 90000000dae6585c 90000000db200000 90000000db203840 [ 0.000000] $ 4 : 0000000000000078 0000000000000028 0000000000000001 00000000db203860 [ 0.000000] $ 8 : 0000000000000000 0000000000000040 90000000db203680 0000000000000000 [ 0.000000] $12 : 00000000fdeb0e7c ffffffffffffffc0 00000000fbffffff 0000000020000000 [ 0.000000] $16 : 000000000003e780 0000000020000000 90000000dad8c348 0000000000003fff [ 0.000000] $20 : 0000000000000018 90000000dad8bdd0 90000000db203850 0000000000000040 [ 0.000000] $24 : 000000000000000f 90000000db21a570 90000000daeb07a0 90000000db217000 [ 0.000000] $28 : 90000000db203858 0000000001ffffff 90000000db2171b0 0000000000000040 [ 0.000000] era : 00000000fdeb0e7c 0xfdeb0e7c [ 0.000000] ra : 90000000dae6585c set_virtual_map.isra.0+0x23c/0x394 [ 0.000000] CSR crmd: 90000000db21a570 [ 0.000000] CSR prmd: 00000000 [ 0.000000] CSR euen: 00000000 [ 0.000000] CSR ecfg: 90000000db203850 [ 0.000000] CSR estat: 90000000dae65800 [ 0.000000] ExcCode : 26 (SubCode 16b) [ 0.000000] PrId : 0014c012 (Loongson-64bit) [ 0.000000] Modules linked in: [ 0.000000] Process swapper (pid: 0, threadinfo=(____ptrval____), task=(____ptrval____)) [ 0.000000] Stack : 0000000000000001 00000000fdeb0e7c 0000000000036780 000000000003e780 [ 0.000000] 0000000000000006 0000000010000000 8000000010000000 0000000000010000 [ 0.000000] 8000000000000001 0000000000000005 00000000fde40000 90000000fde40000 [ 0.000000] 0000000000000100 800000000000000f 0000000000000006 00000000fdf40000 [ 0.000000] 90000000fdf40000 0000000000000300 800000000000000f 00000000000000b0 [ 0.000000] 0000000000000001 90000000da094cf0 0000000000000000 ffffffffffffffea [ 0.000000] 90000000db2039b8 ffff0a1000000609 0000000000000035 0000000000000030 [ 0.000000] 90000000dad7b258 0000000000000400 00000000000000b0 ffff0a1000000609 [ 0.000000] 90000000db2039a8 90000000db095730 000000007fffffff ffff0a1000000609 [ 0.000000] 90000000db203a90 90000000db203a30 90000000db2039d8 90000000db09570b [ 0.000000] ... [ 0.000000] Call Trace: [ 0.000000] [ 0.000000] Code: (Bad address in era) [ 0.000000] [ 0.000000] Signed-off-by: Youling Tang <[email protected]> Signed-off-by: Tiezhu Yang <[email protected]>

MingcongBai · 2024-04-30T08:15:00Z

Picked on the rc6 branch, thank you!

Lockdep detects a possible deadlock as listed below. This is because it detects the IA55 interrupt controller .irq_eoi() API is called from interrupt context while configuration-specific API (e.g., .irq_enable()) could be called from process context on resume path (by calling rzg2l_gpio_irq_restore()). To avoid this, protect the call of rzg2l_gpio_irq_enable() with spin_lock_irqsave()/spin_unlock_irqrestore(). With this the same approach that is available in __setup_irq() is mimicked to pinctrl IRQ resume function. Below is the lockdep report: WARNING: inconsistent lock state 6.8.0-rc5-next-20240219-arm64-renesas-00030-gb17a289abf1f torvalds#90 Not tainted -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. str_rwdt_t_001./159 [HC0[0]:SC0[0]:HE1:SE1] takes: ffff00000b001d70 (&rzg2l_irqc_data->lock){?...}-{2:2}, at: rzg2l_irqc_irq_enable+0x60/0xa4 {IN-HARDIRQ-W} state was registered at: lock_acquire+0x1e0/0x310 _raw_spin_lock+0x44/0x58 rzg2l_irqc_eoi+0x2c/0x130 irq_chip_eoi_parent+0x18/0x20 rzg2l_gpio_irqc_eoi+0xc/0x14 handle_fasteoi_irq+0x134/0x230 generic_handle_domain_irq+0x28/0x3c gic_handle_irq+0x4c/0xbc call_on_irq_stack+0x24/0x34 do_interrupt_handler+0x78/0x7c el1_interrupt+0x30/0x5c el1h_64_irq_handler+0x14/0x1c el1h_64_irq+0x64/0x68 _raw_spin_unlock_irqrestore+0x34/0x70 __setup_irq+0x4d4/0x6b8 request_threaded_irq+0xe8/0x1a0 request_any_context_irq+0x60/0xb8 devm_request_any_context_irq+0x74/0x104 gpio_keys_probe+0x374/0xb08 platform_probe+0x64/0xcc really_probe+0x140/0x2ac __driver_probe_device+0x74/0x124 driver_probe_device+0x3c/0x15c __driver_attach+0xec/0x1c4 bus_for_each_dev+0x70/0xcc driver_attach+0x20/0x28 bus_add_driver+0xdc/0x1d0 driver_register+0x5c/0x118 __platform_driver_register+0x24/0x2c gpio_keys_init+0x18/0x20 do_one_initcall+0x70/0x290 kernel_init_freeable+0x294/0x504 kernel_init+0x20/0x1cc ret_from_fork+0x10/0x20 irq event stamp: 69071 hardirqs last enabled at (69071): [<ffff800080e0dafc>] _raw_spin_unlock_irqrestore+0x6c/0x70 hardirqs last disabled at (69070): [<ffff800080e0cfec>] _raw_spin_lock_irqsave+0x7c/0x80 softirqs last enabled at (67654): [<ffff800080010614>] __do_softirq+0x494/0x4dc softirqs last disabled at (67645): [<ffff800080015238>] ____do_softirq+0xc/0x14 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&rzg2l_irqc_data->lock); <Interrupt> lock(&rzg2l_irqc_data->lock); *** DEADLOCK *** 4 locks held by str_rwdt_t_001./159: #0: ffff00000b10f3f0 (sb_writers#4){.+.+}-{0:0}, at: vfs_write+0x1a4/0x35c #1: ffff00000e43ba88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xe8/0x1a8 #2: ffff00000aa21dc8 (kn->active#40){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xf0/0x1a8 #3: ffff80008179d970 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend+0x9c/0x278 stack backtrace: CPU: 0 PID: 159 Comm: str_rwdt_t_001. Not tainted 6.8.0-rc5-next-20240219-arm64-renesas-00030-gb17a289abf1f torvalds#90 Hardware name: Renesas SMARC EVK version 2 based on r9a08g045s33 (DT) Call trace: dump_backtrace+0x94/0xe8 show_stack+0x14/0x1c dump_stack_lvl+0x88/0xc4 dump_stack+0x14/0x1c print_usage_bug.part.0+0x294/0x348 mark_lock+0x6b0/0x948 __lock_acquire+0x750/0x20b0 lock_acquire+0x1e0/0x310 _raw_spin_lock+0x44/0x58 rzg2l_irqc_irq_enable+0x60/0xa4 irq_chip_enable_parent+0x1c/0x34 rzg2l_gpio_irq_enable+0xc4/0xd8 rzg2l_pinctrl_resume_noirq+0x4cc/0x520 pm_generic_resume_noirq+0x28/0x3c genpd_finish_resume+0xc0/0xdc genpd_resume_noirq+0x14/0x1c dpm_run_callback+0x34/0x90 device_resume_noirq+0xa8/0x268 dpm_noirq_resume_devices+0x13c/0x160 dpm_resume_noirq+0xc/0x1c suspend_devices_and_enter+0x2c8/0x570 pm_suspend+0x1ac/0x278 state_store+0x88/0x124 kobj_attr_store+0x14/0x24 sysfs_kf_write+0x48/0x6c kernfs_fop_write_iter+0x118/0x1a8 vfs_write+0x270/0x35c ksys_write+0x64/0xec __arm64_sys_write+0x18/0x20 invoke_syscall+0x44/0x108 el0_svc_common.constprop.0+0xb4/0xd4 do_el0_svc+0x18/0x20 el0_svc+0x3c/0xb8 el0t_64_sync_handler+0xb8/0xbc el0t_64_sync+0x14c/0x150 Fixes: 254203f ("pinctrl: renesas: rzg2l: Add suspend/resume support") Signed-off-by: Claudiu Beznea <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Geert Uytterhoeven <[email protected]>

Commit 1548036 ("nfs: make the rpc_stat per net namespace") added functionality to specify rpc_stats function but missed adding it to the TCP TLS functionality. As the result, mounting with xprtsec=tls lead to the following kernel oops. [ 128.984192] Unable to handle kernel NULL pointer dereference at virtual address 000000000000001c [ 128.985058] Mem abort info: [ 128.985372] ESR = 0x0000000096000004 [ 128.985709] EC = 0x25: DABT (current EL), IL = 32 bits [ 128.986176] SET = 0, FnV = 0 [ 128.986521] EA = 0, S1PTW = 0 [ 128.986804] FSC = 0x04: level 0 translation fault [ 128.987229] Data abort info: [ 128.987597] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 128.988169] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 128.988811] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 128.989302] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000106c84000 [ 128.990048] [000000000000001c] pgd=0000000000000000, p4d=0000000000000000 [ 128.990736] Internal error: Oops: 0000000096000004 [#1] SMP [ 128.991168] Modules linked in: nfs_layout_nfsv41_files rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace netfs uinput dm_mod nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo videobuf2_vmalloc videobuf2_memops uvc videobuf2_v4l2 videodev videobuf2_common mc vmw_vmci xfs libcrc32c e1000e crct10dif_ce ghash_ce sha2_ce vmwgfx nvme sha256_arm64 nvme_core sr_mod cdrom sha1_ce drm_ttm_helper ttm drm_kms_helper drm sg fuse [ 128.996466] CPU: 0 PID: 179 Comm: kworker/u4:26 Kdump: loaded Not tainted 6.8.0-rc6+ #12 [ 128.997226] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.21805430.BA64.2305221830 05/22/2023 [ 128.998084] Workqueue: xprtiod xs_tcp_tls_setup_socket [sunrpc] [ 128.998701] pstate: 81400005 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 128.999384] pc : call_start+0x74/0x138 [sunrpc] [ 128.999809] lr : __rpc_execute+0xb8/0x3e0 [sunrpc] [ 129.000244] sp : ffff8000832b3a00 [ 129.000508] x29: ffff8000832b3a00 x28: ffff800081ac79c0 x27: ffff800081ac7000 [ 129.001111] x26: 0000000004248060 x25: 0000000000000000 x24: ffff800081596008 [ 129.001757] x23: ffff80007b087240 x22: ffff00009a509d30 x21: 0000000000000000 [ 129.002345] x20: ffff000090075600 x19: ffff00009a509d00 x18: ffffffffffffffff [ 129.002912] x17: 733d4d4554535953 x16: 42555300312d746e x15: ffff8000832b3a88 [ 129.003464] x14: ffffffffffffffff x13: ffff8000832b3a7d x12: 0000000000000008 [ 129.004021] x11: 0101010101010101 x10: ffff8000150cb560 x9 : ffff80007b087c00 [ 129.004577] x8 : ffff00009a509de0 x7 : 0000000000000000 x6 : 00000000be8c4ee3 [ 129.005026] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff000094d56680 [ 129.005425] x2 : ffff80007b0637f8 x1 : ffff000090075600 x0 : ffff00009a509d00 [ 129.005824] Call trace: [ 129.005967] call_start+0x74/0x138 [sunrpc] [ 129.006233] __rpc_execute+0xb8/0x3e0 [sunrpc] [ 129.006506] rpc_execute+0x160/0x1d8 [sunrpc] [ 129.006778] rpc_run_task+0x148/0x1f8 [sunrpc] [ 129.007204] tls_probe+0x80/0xd0 [sunrpc] [ 129.007460] rpc_ping+0x28/0x80 [sunrpc] [ 129.007715] rpc_create_xprt+0x134/0x1a0 [sunrpc] [ 129.007999] rpc_create+0x128/0x2a0 [sunrpc] [ 129.008264] xs_tcp_tls_setup_socket+0xdc/0x508 [sunrpc] [ 129.008583] process_one_work+0x174/0x3c8 [ 129.008813] worker_thread+0x2c8/0x3e0 [ 129.009033] kthread+0x100/0x110 [ 129.009225] ret_from_fork+0x10/0x20 [ 129.009432] Code: f0ffffc2 911fe042 aa1403e1 aa1303e0 (b9401c83) Fixes: 1548036 ("nfs: make the rpc_stat per net namespace") Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>

At the time of LPAR boot up, partition firmware provides Open Firmware property ibm,dma-window for the PE. This property is provided on the PCI bus the PE is attached to. There are execptions where the partition firmware might not provide this property for the PE at the time of LPAR boot up. One of the scenario is where the firmware has frozen the PE due to some error condition. This PE is frozen for 24 hours or unless the whole system is reinitialized. Within this time frame, if the LPAR is booted, the frozen PE will be presented to the LPAR but ibm,dma-window property could be missing. Today, under these circumstances, the LPAR oopses with NULL pointer dereference, when configuring the PCI bus the PE is attached to. BUG: Kernel NULL pointer dereference on read at 0x000000c8 Faulting instruction address: 0xc0000000001024c0 Oops: Kernel access of bad area, sig: 7 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: Supported: Yes CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1 Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries NIP: c0000000001024c0 LR: c0000000001024b0 CTR: c000000000102450 REGS: c0000000037db5c0 TRAP: 0300 Not tainted (6.4.0-150600.9-default) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 28000822 XER: 00000000 CFAR: c00000000010254c DAR: 00000000000000c8 DSISR: 00080000 IRQMASK: 0 ... NIP [c0000000001024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0 LR [c0000000001024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 Call Trace: pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable) pcibios_setup_bus_self+0x1c0/0x370 __of_scan_bus+0x2f8/0x330 pcibios_scan_phb+0x280/0x3d0 pcibios_init+0x88/0x12c do_one_initcall+0x60/0x320 kernel_init_freeable+0x344/0x3e4 kernel_init+0x34/0x1d0 ret_from_kernel_user_thread+0x14/0x1c Fixes: b1fc44e ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window") Signed-off-by: Gaurav Batra <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]

…active The default nna (node_nr_active) is used when the pool isn't tied to a specific NUMA node. This can happen in the following cases: 1. On NUMA, if per-node pwq init failure and the fallback pwq is used. 2. On NUMA, if a pool is configured to span multiple nodes. 3. On single node setups. 5797b1c ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") set the default nna->max to min_active because only #1 was being considered. For #2 and #3, using min_active means that the max concurrency in normal operation is pushed down to min_active which is currently 8, which can obviously lead to performance issues. exact value nna->max is set to doesn't really matter. #2 can only happen if the workqueue is intentionally configured to ignore NUMA boundaries and there's no good way to distribute max_active in this case. #3 is the default behavior on single node machines. Let's set it the default nna->max to max_active. This fixes the artificially lowered concurrency problem on single node machines and shouldn't hurt anything for other cases. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Shinichiro Kawasaki <[email protected]> Fixes: 5797b1c ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") Link: https://lore.kernel.org/dm-devel/[email protected]/ Signed-off-by: Tejun Heo <[email protected]>

…ocation" When this change was introduced between v6.10.4 and v6.10.5, the Broadcom Tigon3 Ethernet interface (tg3) found on Apple MacBook Pro (15'', Mid 2010) would throw many rcu stall errors during boot up, causing peripherals such as the wireless card to misbehave. [ 24.153855] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 2-.... } 21 jiffies s: 973 root: 0x4/. [ 24.166938] rcu: blocking rcu_node structures (internal RCU debug): [ 24.177800] Sending NMI from CPU 3 to CPUs 2: [ 24.183113] NMI backtrace for cpu 2 [ 24.183119] CPU: 2 PID: 1049 Comm: NetworkManager Not tainted 6.10.5-aosc-main #1 [ 24.183123] Hardware name: Apple Inc. MacBookPro6,2/Mac-F22586C8, BIOS MBP61.88Z.005D.B00.1804100943 04/10/18 [ 24.183125] RIP: 0010:__this_module+0x2d3d1/0x4f310 [tg3] [ 24.183135] Code: c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 89 f6 48 03 77 30 8b 06 <31> f6 31 ff c3 cc cc cc cc 66 0f 1f 44 00 00 90 90 90 90 90 90 90 [ 24.183138] RSP: 0018:ffffbf1a011d75e8 EFLAGS: 00000082 [ 24.183141] RAX: 0000000000000000 RBX: ffffa04ec78f8a00 RCX: 0000000000000000 [ 24.183143] RDX: 0000000000000000 RSI: ffffbf1a00fb007c RDI: ffffa04ec78f8a00 [ 24.183145] RBP: 0000000000000b50 R08: 0000000000000000 R09: 0000000000000000 [ 24.183147] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000216 [ 24.183148] R13: ffffbf1a011d7624 R14: ffffa04ec78f8a08 R15: ffffa04ec78f8b40 [ 24.183151] FS: 00007f4c524b2140(0000) GS:ffffa05007d00000(0000) knlGS:0000000000000000 [ 24.183153] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 24.183155] CR2: 00007f7025eae3e8 CR3: 00000001040f8000 CR4: 00000000000006f0 [ 24.183157] Call Trace: [ 24.183162] <NMI> [ 24.183167] ? nmi_cpu_backtrace+0xbf/0x140 [ 24.183175] ? nmi_cpu_backtrace_handler+0x11/0x20 [ 24.183181] ? nmi_handle+0x61/0x160 [ 24.183186] ? default_do_nmi+0x42/0x110 [ 24.183191] ? exc_nmi+0x1bd/0x290 [ 24.183194] ? end_repeat_nmi+0xf/0x53 [ 24.183203] ? __this_module+0x2d3d1/0x4f310 [tg3] [ 24.183207] ? __this_module+0x2d3d1/0x4f310 [tg3] [ 24.183210] ? __this_module+0x2d3d1/0x4f310 [tg3] [ 24.183213] </NMI> [ 24.183214] <TASK> [ 24.183215] __this_module+0x31828/0x4f310 [tg3] [ 24.183218] ? __this_module+0x2d390/0x4f310 [tg3] [ 24.183221] __this_module+0x398e6/0x4f310 [tg3] [ 24.183225] __this_module+0x3baf8/0x4f310 [tg3] [ 24.183229] __this_module+0x4733f/0x4f310 [tg3] [ 24.183233] ? _raw_spin_unlock_irqrestore+0x25/0x70 [ 24.183237] ? __this_module+0x398e6/0x4f310 [tg3] [ 24.183241] __this_module+0x4b943/0x4f310 [tg3] [ 24.183244] ? delay_tsc+0x89/0xf0 [ 24.183249] ? preempt_count_sub+0x51/0x60 [ 24.183254] __this_module+0x4be4b/0x4f310 [tg3] [ 24.183258] __dev_open+0x103/0x1c0 [ 24.183265] __dev_change_flags+0x1bd/0x230 [ 24.183269] ? rtnl_getlink+0x362/0x400 [ 24.183276] dev_change_flags+0x26/0x70 [ 24.183280] do_setlink+0xe16/0x11f0 [ 24.183286] ? __nla_validate_parse+0x61/0xd40 [ 24.183295] __rtnl_newlink+0x63d/0x9f0 [ 24.183301] ? kmem_cache_alloc_node_noprof+0x12b/0x360 [ 24.183308] ? kmalloc_trace_noprof+0x11e/0x350 [ 24.183312] ? rtnl_newlink+0x2e/0x70 [ 24.183316] rtnl_newlink+0x47/0x70 [ 24.183320] rtnetlink_rcv_msg+0x152/0x400 [ 24.183324] ? __netlink_sendskb+0x68/0x90 [ 24.183329] ? netlink_unicast+0x237/0x290 [ 24.183333] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 24.183336] netlink_rcv_skb+0x5b/0x110 [ 24.183343] netlink_unicast+0x1a4/0x290 [ 24.183347] netlink_sendmsg+0x222/0x4a0 [ 24.183350] ? proc_get_long.constprop.0+0x116/0x210 [ 24.183358] ____sys_sendmsg+0x379/0x3b0 [ 24.183363] ? copy_msghdr_from_user+0x6d/0xb0 [ 24.183368] ___sys_sendmsg+0x86/0xe0 [ 24.183372] ? addrconf_sysctl_forward+0xf3/0x270 [ 24.183378] ? _copy_from_iter+0x8b/0x570 [ 24.183384] ? __pfx_addrconf_sysctl_forward+0x10/0x10 [ 24.183388] ? _raw_spin_unlock+0x19/0x50 [ 24.183392] ? proc_sys_call_handler+0xf3/0x2f0 [ 24.183397] ? trace_hardirqs_on+0x29/0x90 [ 24.183401] ? __fdget+0xc2/0xf0 [ 24.183405] __sys_sendmsg+0x5b/0xc0 [ 24.183410] ? syscall_trace_enter+0x110/0x1b0 [ 24.183416] do_syscall_64+0x64/0x150 [ 24.183423] entry_SYSCALL_64_after_hwframe+0x76/0x7e I have bisected the error to this commit. Reverting it caused no new or perceivable issues on both the MacBook and a Zen4-based laptop. Revert this commit as a workaround. This reverts commit aa162aa. Upstream report: https://bugzilla.kernel.org/show_bug.cgi?id=219390 Signed-off-by: Mingcong Bai <[email protected]> Bug: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kexy Biscuit <[email protected]>

Avoid below overlapping mappings by using a contiguous non-cacheable buffer. [ 4.077708] DMA-API: stm32_fmc2_nfc 48810000.nand-controller: cacheline tracking EEXIST, overlapping mappings aren't supported [ 4.089103] WARNING: CPU: 1 PID: 44 at kernel/dma/debug.c:568 add_dma_entry+0x23c/0x300 [ 4.097071] Modules linked in: [ 4.100101] CPU: 1 PID: 44 Comm: kworker/u4:2 Not tainted 6.1.82 #1 [ 4.106346] Hardware name: STMicroelectronics STM32MP257F VALID1 SNOR / MB1704 (LPDDR4 Power discrete) + MB1703 + MB1708 (SNOR MB1730) (DT) [ 4.118824] Workqueue: events_unbound deferred_probe_work_func [ 4.124674] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 4.131624] pc : add_dma_entry+0x23c/0x300 [ 4.135658] lr : add_dma_entry+0x23c/0x300 [ 4.139792] sp : ffff800009dbb490 [ 4.143016] x29: ffff800009dbb4a0 x28: 0000000004008022 x27: ffff8000098a6000 [ 4.150174] x26: 0000000000000000 x25: ffff8000099e7000 x24: ffff8000099e7de8 [ 4.157231] x23: 00000000ffffffff x22: 0000000000000000 x21: ffff8000098a6a20 [ 4.164388] x20: ffff000080964180 x19: ffff800009819ba0 x18: 0000000000000006 [ 4.171545] x17: 6361727420656e69 x16: 6c6568636163203a x15: 72656c6c6f72746e [ 4.178602] x14: 6f632d646e616e2e x13: ffff800009832f58 x12: 00000000000004ec [ 4.185759] x11: 00000000000001a4 x10: ffff80000988af58 x9 : ffff800009832f58 [ 4.192916] x8 : 00000000ffffefff x7 : ffff80000988af58 x6 : 80000000fffff000 [ 4.199972] x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000 [ 4.207128] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000812d2c40 [ 4.214185] Call trace: [ 4.216605] add_dma_entry+0x23c/0x300 [ 4.220338] debug_dma_map_sg+0x198/0x350 [ 4.224373] __dma_map_sg_attrs+0xa0/0x110 [ 4.228411] dma_map_sg_attrs+0x10/0x2c [ 4.232247] stm32_fmc2_nfc_xfer.isra.0+0x1c8/0x3fc [ 4.237088] stm32_fmc2_nfc_seq_read_page+0xc8/0x174 [ 4.242127] nand_read_oob+0x1d4/0x8e0 [ 4.245861] mtd_read_oob_std+0x58/0x84 [ 4.249596] mtd_read_oob+0x90/0x150 [ 4.253231] mtd_read+0x68/0xac Signed-off-by: Christophe Kerello <[email protected]> Cc: [email protected] Fixes: 2cd457f ("mtd: rawnand: stm32_fmc2: add STM32 FMC2 NAND flash controller driver") Signed-off-by: Miquel Raynal <[email protected]>

Problem description =================== Lockdep reports a possible circular locking dependency (AB/BA) between &pl->state_mutex and &phy->lock, as follows. phylink_resolve() // acquires &pl->state_mutex -> phylink_major_config() -> phy_config_inband() // acquires &pl->phydev->lock whereas all the other call sites where &pl->state_mutex and &pl->phydev->lock have the locking scheme reversed. Everywhere else, &pl->phydev->lock is acquired at the top level, and &pl->state_mutex at the lower level. A clear example is phylink_bringup_phy(). The outlier is the newly introduced phy_config_inband() and the existing lock order is the correct one. To understand why it cannot be the other way around, it is sufficient to consider phylink_phy_change(), phylink's callback from the PHY device's phy->phy_link_change() virtual method, invoked by the PHY state machine. phy_link_up() and phy_link_down(), the (indirect) callers of phylink_phy_change(), are called with &phydev->lock acquired. Then phylink_phy_change() acquires its own &pl->state_mutex, to serialize changes made to its pl->phy_state and pl->link_config. So all other instances of &pl->state_mutex and &phydev->lock must be consistent with this order. Problem impact ============== I think the kernel runs a serious deadlock risk if an existing phylink_resolve() thread, which results in a phy_config_inband() call, is concurrent with a phy_link_up() or phy_link_down() call, which will deadlock on &pl->state_mutex in phylink_phy_change(). Practically speaking, the impact may be limited by the slow speed of the medium auto-negotiation protocol, which makes it unlikely for the current state to still be unresolved when a new one is detected, but I think the problem is there. Nonetheless, the problem was discovered using lockdep. Proposed solution ================= Practically speaking, the phy_config_inband() requirement of having phydev->lock acquired must transfer to the caller (phylink is the only caller). There, it must bubble up until immediately before &pl->state_mutex is acquired, for the cases where that takes place. Solution details, considerations, notes ======================================= This is the phy_config_inband() call graph: sfp_upstream_ops :: connect_phy() | v phylink_sfp_connect_phy() | v phylink_sfp_config_phy() | | sfp_upstream_ops :: module_insert() | | | v | phylink_sfp_module_insert() | | | | sfp_upstream_ops :: module_start() | | | | | v | | phylink_sfp_module_start() | | | | v v | phylink_sfp_config_optical() phylink_start() | | | phylink_resume() v v | | phylink_sfp_set_config() | | | v v v phylink_mac_initial_config() | phylink_resolve() | | phylink_ethtool_ksettings_set() v v v phylink_major_config() | v phy_config_inband() phylink_major_config() caller #1, phylink_mac_initial_config(), does not acquire &pl->state_mutex nor do its callers. It must acquire &pl->phydev->lock prior to calling phylink_major_config(). phylink_major_config() caller #2, phylink_resolve() acquires &pl->state_mutex, thus also needs to acquire &pl->phydev->lock. phylink_major_config() caller #3, phylink_ethtool_ksettings_set(), is completely uninteresting, because it only calls phylink_major_config() if pl->phydev is NULL (otherwise it calls phy_ethtool_ksettings_set()). We need to change nothing there. Other solutions =============== The lock inversion between &pl->state_mutex and &pl->phydev->lock has occurred at least once before, as seen in commit c718af2 ("net: phylink: fix ethtool -A with attached PHYs"). The solution there was to simply not call phy_set_asym_pause() under the &pl->state_mutex. That cannot be extended to our case though, where the phy_config_inband() call is much deeper inside the &pl->state_mutex section. Fixes: 5fd0f1a ("net: phylink: add negotiation of in-band capabilities") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

5da3d94 ("PCI: mvebu: Use for_each_of_range() iterator for parsing "ranges"") simplified code by using the for_each_of_range() iterator, but it broke PCI enumeration on Turris Omnia (and probably other mvebu targets). Issue #1: To determine range.flags, of_pci_range_parser_one() uses bus->get_flags(), which resolves to of_bus_pci_get_flags(), which already returns an IORESOURCE bit field, and NOT the original flags from the "ranges" resource. Then mvebu_get_tgt_attr() attempts the very same conversion again. Remove the misinterpretation of range.flags in mvebu_get_tgt_attr(), to restore the intended behavior. Issue #2: The driver needs target and attributes, which are encoded in the raw address values of the "/soc/pcie/ranges" resource. According to of_pci_range_parser_one(), the raw values are stored in range.bus_addr and range.parent_bus_addr, respectively. range.cpu_addr is a translated version of range.parent_bus_addr, and not relevant here. Use the correct range structure member, to extract target and attributes. This restores the intended behavior. Fixes: 5da3d94 ("PCI: mvebu: Use for_each_of_range() iterator for parsing "ranges"") Reported-by: Jan Palus <[email protected]> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220479 Signed-off-by: Klaus Kudielka <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Tested-by: Tony Dinh <[email protected]> Tested-by: Jan Palus <[email protected]> Link: https://patch.msgid.link/[email protected]

The function ceph_process_folio_batch() sets folio_batch entries to NULL, which is an illegal state. Before folio_batch_release() crashes due to this API violation, the function ceph_shift_unused_folios_left() is supposed to remove those NULLs from the array. However, since commit ce80b76 ("ceph: introduce ceph_process_folio_batch() method"), this shifting doesn't happen anymore because the "for" loop got moved to ceph_process_folio_batch(), and now the `i` variable that remains in ceph_writepages_start() doesn't get incremented anymore, making the shifting effectively unreachable much of the time. Later, commit 1551ec6 ("ceph: introduce ceph_submit_write() method") added more preconditions for doing the shift, replacing the `i` check (with something that is still just as broken): - if ceph_process_folio_batch() fails, shifting never happens - if ceph_move_dirty_page_in_page_array() was never called (because ceph_process_folio_batch() has returned early for some of various reasons), shifting never happens - if `processed_in_fbatch` is zero (because ceph_process_folio_batch() has returned early for some of the reasons mentioned above or because ceph_move_dirty_page_in_page_array() has failed), shifting never happens Since those two commits, any problem in ceph_process_folio_batch() could crash the kernel, e.g. this way: BUG: kernel NULL pointer dereference, address: 0000000000000034 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: Oops: 0002 [#1] SMP NOPTI CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es torvalds#714 NONE Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023 Workqueue: writeback wb_workfn (flush-ceph-1) RIP: 0010:folios_put_refs+0x85/0x140 Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 > RSP: 0018:ffffb880af8db778 EFLAGS: 00010207 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003 RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0 RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0 R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000 FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> ceph_writepages_start+0xeb9/0x1410 The crash can be reproduced easily by changing the ceph_check_page_before_write() return value to `-E2BIG`. (Interestingly, the crash happens only if `huge_zero_folio` has already been allocated; without `huge_zero_folio`, is_huge_zero_folio(NULL) returns true and folios_put_refs() skips NULL entries instead of dereferencing them. That makes reproducing the bug somewhat unreliable. See https://lore.kernel.org/[email protected] for a discussion of this detail.) My suggestion is to move the ceph_shift_unused_folios_left() to right after ceph_process_folio_batch() to ensure it always gets called to fix up the illegal folio_batch state. Cc: [email protected] Fixes: ce80b76 ("ceph: introduce ceph_process_folio_batch() method") Link: https://lore.kernel.org/ceph-devel/[email protected]/ Signed-off-by: Max Kellermann <[email protected]> Reviewed-by: Viacheslav Dubeyko <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>

If request_irq() in i40e_vsi_request_irq_msix() fails in an iteration later than the first, the error path wants to free the IRQs requested so far. However, it uses the wrong dev_id argument for free_irq(), so it does not free the IRQs correctly and instead triggers the warning: Trying to free already-free IRQ 173 WARNING: CPU: 25 PID: 1091 at kernel/irq/manage.c:1829 __free_irq+0x192/0x2c0 Modules linked in: i40e(+) [...] CPU: 25 UID: 0 PID: 1091 Comm: NetworkManager Not tainted 6.17.0-rc1+ #1 PREEMPT(lazy) Hardware name: [...] RIP: 0010:__free_irq+0x192/0x2c0 [...] Call Trace: <TASK> free_irq+0x32/0x70 i40e_vsi_request_irq_msix.cold+0x63/0x8b [i40e] i40e_vsi_request_irq+0x79/0x80 [i40e] i40e_vsi_open+0x21f/0x2f0 [i40e] i40e_open+0x63/0x130 [i40e] __dev_open+0xfc/0x210 __dev_change_flags+0x1fc/0x240 netif_change_flags+0x27/0x70 do_setlink.isra.0+0x341/0xc70 rtnl_newlink+0x468/0x860 rtnetlink_rcv_msg+0x375/0x450 netlink_rcv_skb+0x5c/0x110 netlink_unicast+0x288/0x3c0 netlink_sendmsg+0x20d/0x430 ____sys_sendmsg+0x3a2/0x3d0 ___sys_sendmsg+0x99/0xe0 __sys_sendmsg+0x8a/0xf0 do_syscall_64+0x82/0x2c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e [...] </TASK> ---[ end trace 0000000000000000 ]--- Use the same dev_id for free_irq() as for request_irq(). I tested this with inserting code to fail intentionally. Fixes: 493fb30 ("i40e: Move q_vectors from pointer to array to array of pointers") Signed-off-by: Michal Schmidt <[email protected]> Reviewed-by: Aleksandr Loktionov <[email protected]> Reviewed-by: Subbaraya Sundeep <[email protected]> Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>

Hangbin Liu says: ==================== hsr: fix lock warnings hsr_for_each_port is called in many places without holding the RCU read lock, this may trigger warnings on debug kernels like: [ 40.457015] [ T201] WARNING: suspicious RCU usage [ 40.457020] [ T201] 6.17.0-rc2-virtme #1 Not tainted [ 40.457025] [ T201] ----------------------------- [ 40.457029] [ T201] net/hsr/hsr_main.c:137 RCU-list traversed in non-reader section!! [ 40.457036] [ T201] other info that might help us debug this: [ 40.457040] [ T201] rcu_scheduler_active = 2, debug_locks = 1 [ 40.457045] [ T201] 2 locks held by ip/201: [ 40.457050] [ T201] #0: ffffffff93040a40 (&ops->srcu){.+.+}-{0:0}, at: rtnl_link_ops_get+0xf2/0x280 [ 40.457080] [ T201] #1: ffffffff92e7f968 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x5e1/0xb20 [ 40.457102] [ T201] stack backtrace: [ 40.457108] [ T201] CPU: 2 UID: 0 PID: 201 Comm: ip Not tainted 6.17.0-rc2-virtme #1 PREEMPT(full) [ 40.457114] [ T201] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 40.457117] [ T201] Call Trace: [ 40.457120] [ T201] <TASK> [ 40.457126] [ T201] dump_stack_lvl+0x6f/0xb0 [ 40.457136] [ T201] lockdep_rcu_suspicious.cold+0x4f/0xb1 [ 40.457148] [ T201] hsr_port_get_hsr+0xfe/0x140 [ 40.457158] [ T201] hsr_add_port+0x192/0x940 [ 40.457167] [ T201] ? __pfx_hsr_add_port+0x10/0x10 [ 40.457176] [ T201] ? lockdep_init_map_type+0x5c/0x270 [ 40.457189] [ T201] hsr_dev_finalize+0x4bc/0xbf0 [ 40.457204] [ T201] hsr_newlink+0x3c3/0x8f0 [ 40.457212] [ T201] ? __pfx_hsr_newlink+0x10/0x10 [ 40.457222] [ T201] ? rtnl_create_link+0x173/0xe40 [ 40.457233] [ T201] rtnl_newlink_create+0x2cf/0x750 [ 40.457243] [ T201] ? __pfx_rtnl_newlink_create+0x10/0x10 [ 40.457247] [ T201] ? __dev_get_by_name+0x12/0x50 [ 40.457252] [ T201] ? rtnl_dev_get+0xac/0x140 [ 40.457259] [ T201] ? __pfx_rtnl_dev_get+0x10/0x10 [ 40.457285] [ T201] __rtnl_newlink+0x22c/0xa50 [ 40.457305] [ T201] rtnl_newlink+0x637/0xb20 Adding rcu_read_lock() for all hsr_for_each_port() looks confusing. Introduce a new helper, hsr_for_each_port_rtnl(), that assumes the RTNL lock is held. This allows callers in suitable contexts to iterate ports safely without explicit RCU locking. Other code paths that rely on RCU protection continue to use hsr_for_each_port() with rcu_read_lock(). ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

…ostcopy When you run a KVM guest with vhost-net and migrate that guest to another host, and you immediately enable postcopy after starting the migration, there is a big chance that the network connection of the guest won't work anymore on the destination side after the migration. With a debug kernel v6.16.0, there is also a call trace that looks like this: FAULT_FLAG_ALLOW_RETRY missing 881 CPU: 6 UID: 0 PID: 549 Comm: kworker/6:2 Kdump: loaded Not tainted 6.16.0 torvalds#56 NONE Hardware name: IBM 3931 LA1 400 (LPAR) Workqueue: events irqfd_inject [kvm] Call Trace: [<00003173cbecc634>] dump_stack_lvl+0x104/0x168 [<00003173cca69588>] handle_userfault+0xde8/0x1310 [<00003173cc756f0c>] handle_pte_fault+0x4fc/0x760 [<00003173cc759212>] __handle_mm_fault+0x452/0xa00 [<00003173cc7599ba>] handle_mm_fault+0x1fa/0x6a0 [<00003173cc73409a>] __get_user_pages+0x4aa/0xba0 [<00003173cc7349e8>] get_user_pages_remote+0x258/0x770 [<000031734be6f052>] get_map_page+0xe2/0x190 [kvm] [<000031734be6f910>] adapter_indicators_set+0x50/0x4a0 [kvm] [<000031734be7f674>] set_adapter_int+0xc4/0x170 [kvm] [<000031734be2f268>] kvm_set_irq+0x228/0x3f0 [kvm] [<000031734be27000>] irqfd_inject+0xd0/0x150 [kvm] [<00003173cc00c9ec>] process_one_work+0x87c/0x1490 [<00003173cc00dda6>] worker_thread+0x7a6/0x1010 [<00003173cc02dc36>] kthread+0x3b6/0x710 [<00003173cbed2f0c>] __ret_from_fork+0xdc/0x7f0 [<00003173cdd737ca>] ret_from_fork+0xa/0x30 3 locks held by kworker/6:2/549: #0: 00000000800bc958 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ee/0x1490 #1: 000030f3d527fbd0 ((work_completion)(&irqfd->inject)){+.+.}-{0:0}, at: process_one_work+0x81c/0x1490 #2: 00000000f99862b0 (&mm->mmap_lock){++++}-{3:3}, at: get_map_page+0xa8/0x190 [kvm] The "FAULT_FLAG_ALLOW_RETRY missing" indicates that handle_userfaultfd() saw a page fault request without ALLOW_RETRY flag set, hence userfaultfd cannot remotely resolve it (because the caller was asking for an immediate resolution, aka, FAULT_FLAG_NOWAIT, while remote faults can take time). With that, get_map_page() failed and the irq was lost. We should not be strictly in an atomic environment here and the worker should be sleepable (the call is done during an ioctl from userspace), so we can allow adapter_indicators_set() to just sleep waiting for the remote fault instead. Link: https://issues.redhat.com/browse/RHEL-42486 Signed-off-by: Peter Xu <[email protected]> [thuth: Assembled patch description and fixed some cosmetical issues] Signed-off-by: Thomas Huth <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Acked-by: Janosch Frank <[email protected]> Fixes: f654706 ("KVM: s390/interrupt: do not pin adapter interrupt pages") [frankja: Added fixes tag] Signed-off-by: Janosch Frank <[email protected]>

syzkaller has caught us red-handed once more, this time nesting regular spinlocks behind raw spinlocks: ============================= [ BUG: Invalid wait context ] 6.16.0-rc3-syzkaller-g7b8346bd9fce #0 Not tainted ----------------------------- syz.0.29/3743 is trying to lock: a3ff80008e2e9e18 (&xa->xa_lock#20){....}-{3:3}, at: vgic_put_irq+0xb4/0x190 arch/arm64/kvm/vgic/vgic.c:137 other info that might help us debug this: context-{5:5} 3 locks held by syz.0.29/3743: #0: a3ff80008e2e90a8 (&kvm->slots_lock){+.+.}-{4:4}, at: kvm_vgic_destroy+0x50/0x624 arch/arm64/kvm/vgic/vgic-init.c:499 #1: a3ff80008e2e9fa0 (&kvm->arch.config_lock){+.+.}-{4:4}, at: kvm_vgic_destroy+0x5c/0x624 arch/arm64/kvm/vgic/vgic-init.c:500 #2: 58f0000021be1428 (&vgic_cpu->ap_list_lock){....}-{2:2}, at: vgic_flush_pending_lpis+0x3c/0x31c arch/arm64/kvm/vgic/vgic.c:150 stack backtrace: CPU: 0 UID: 0 PID: 3743 Comm: syz.0.29 Not tainted 6.16.0-rc3-syzkaller-g7b8346bd9fce #0 PREEMPT Hardware name: linux,dummy-virt (DT) Call trace: show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C) __dump_stack+0x30/0x40 lib/dump_stack.c:94 dump_stack_lvl+0xd8/0x12c lib/dump_stack.c:120 dump_stack+0x1c/0x28 lib/dump_stack.c:129 print_lock_invalid_wait_context kernel/locking/lockdep.c:4833 [inline] check_wait_context kernel/locking/lockdep.c:4905 [inline] __lock_acquire+0x978/0x299c kernel/locking/lockdep.c:5190 lock_acquire+0x14c/0x2e0 kernel/locking/lockdep.c:5871 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x5c/0x7c kernel/locking/spinlock.c:162 vgic_put_irq+0xb4/0x190 arch/arm64/kvm/vgic/vgic.c:137 vgic_flush_pending_lpis+0x24c/0x31c arch/arm64/kvm/vgic/vgic.c:158 __kvm_vgic_vcpu_destroy+0x44/0x500 arch/arm64/kvm/vgic/vgic-init.c:455 kvm_vgic_destroy+0x100/0x624 arch/arm64/kvm/vgic/vgic-init.c:505 kvm_arch_destroy_vm+0x80/0x138 arch/arm64/kvm/arm.c:244 kvm_destroy_vm virt/kvm/kvm_main.c:1308 [inline] kvm_put_kvm+0x800/0xff8 virt/kvm/kvm_main.c:1344 kvm_vm_release+0x58/0x78 virt/kvm/kvm_main.c:1367 __fput+0x4ac/0x980 fs/file_table.c:465 ____fput+0x20/0x58 fs/file_table.c:493 task_work_run+0x1bc/0x254 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] do_notify_resume+0x1b4/0x270 arch/arm64/kernel/entry-common.c:151 exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline] exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline] el0_svc+0xb4/0x160 arch/arm64/kernel/entry-common.c:768 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600 This is of course no good, but is at odds with how LPI refcounts are managed. Solve the locking mess by deferring the release of unreferenced LPIs after the ap_list_lock is released. Mark these to-be-released LPIs specially to avoid racing with vgic_put_irq() and causing a double-free. Since references can only be taken on LPIs with a nonzero refcount, extending the lifetime of freed LPIs is still safe. Reviewed-by: Marc Zyngier <[email protected]> Reported-by: [email protected] Closes: https://lore.kernel.org/kvmarm/[email protected]/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>

…own_locked() When 9770b42 ("crypto: ccp - Move dev_info/err messages for SEV/SNP init and shutdown") moved the error messages dumping so that they don't need to be issued by the callers, it missed the case where __sev_firmware_shutdown() calls __sev_platform_shutdown_locked() with a NULL argument which leads to a NULL ptr deref on the shutdown path, during suspend to disk: #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 983 Comm: hib.sh Not tainted 6.17.0-rc4+ #1 PREEMPT(voluntary) Hardware name: Supermicro Super Server/H12SSL-i, BIOS 2.5 09/08/2022 RIP: 0010:__sev_platform_shutdown_locked.cold+0x0/0x21 [ccp] That rIP is: 00000000000006fd <__sev_platform_shutdown_locked.cold>: 6fd: 8b 13 mov (%rbx),%edx 6ff: 48 8b 7d 00 mov 0x0(%rbp),%rdi 703: 89 c1 mov %eax,%ecx Code: 74 05 31 ff 41 89 3f 49 8b 3e 89 ea 48 c7 c6 a0 8e 54 a0 41 bf 92 ff ff ff e8 e5 2e 09 e1 c6 05 2a d4 38 00 01 e9 26 af ff ff <8b> 13 48 8b 7d 00 89 c1 48 c7 c6 18 90 54 a0 89 44 24 04 e8 c1 2e RSP: 0018:ffffc90005467d00 EFLAGS: 00010282 RAX: 00000000ffffff92 RBX: 0000000000000000 RCX: 0000000000000000 ^^^^^^^^^^^^^^^^ and %rbx is nice and clean. Call Trace: <TASK> __sev_firmware_shutdown.isra.0 sev_dev_destroy psp_dev_destroy sp_destroy pci_device_shutdown device_shutdown kernel_power_off hibernate.cold state_store kernfs_fop_write_iter vfs_write ksys_write do_syscall_64 entry_SYSCALL_64_after_hwframe Pass in a pointer to the function-local error var in the caller. With that addressed, suspending the ccp shows the error properly at least: ccp 0000:47:00.1: sev command 0x2 timed out, disabling PSP ccp 0000:47:00.1: SEV: failed to SHUTDOWN error 0x0, rc -110 SEV-SNP: Leaking PFN range 0x146800-0x146a00 SEV-SNP: PFN 0x146800 unassigned, dumping non-zero entries in 2M PFN region: [0x146800 - 0x146a00] ... ccp 0000:47:00.1: SEV-SNP firmware shutdown failed, rc -16, error 0x0 ACPI: PM: Preparing to enter system sleep state S5 kvm: exiting hardware virtualization reboot: Power down Btw, this driver is crying to be cleaned up to pass in a proper I/O struct which can be used to store information between the different functions, otherwise stuff like that will happen in the future again. Fixes: 9770b42 ("crypto: ccp - Move dev_info/err messages for SEV/SNP init and shutdown") Signed-off-by: Borislav Petkov (AMD) <[email protected]> Cc: <[email protected]> Reviewed-by: Ashish Kalra <[email protected]> Acked-by: Tom Lendacky <[email protected]> Signed-off-by: Herbert Xu <[email protected]>

…PAIR A NULL pointer dereference can occur in tcp_ao_finish_connect() during a connect() system call on a socket with a TCP-AO key added and TCP_REPAIR enabled. The function is called with skb being NULL and attempts to dereference it on tcp_hdr(skb)->seq without a prior skb validation. Fix this by checking if skb is NULL before dereferencing it. The commentary is taken from bpf_skops_established(), which is also called in the same flow. Unlike the function being patched, bpf_skops_established() validates the skb before dereferencing it. int main(void){ struct sockaddr_in sockaddr; struct tcp_ao_add tcp_ao; int sk; int one = 1; memset(&sockaddr,'\0',sizeof(sockaddr)); memset(&tcp_ao,'\0',sizeof(tcp_ao)); sk = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); sockaddr.sin_family = AF_INET; memcpy(tcp_ao.alg_name,"cmac(aes128)",12); memcpy(tcp_ao.key,"ABCDEFGHABCDEFGH",16); tcp_ao.keylen = 16; memcpy(&tcp_ao.addr,&sockaddr,sizeof(sockaddr)); setsockopt(sk, IPPROTO_TCP, TCP_AO_ADD_KEY, &tcp_ao, sizeof(tcp_ao)); setsockopt(sk, IPPROTO_TCP, TCP_REPAIR, &one, sizeof(one)); sockaddr.sin_family = AF_INET; sockaddr.sin_port = htobe16(123); inet_aton("127.0.0.1", &sockaddr.sin_addr); connect(sk,(struct sockaddr *)&sockaddr,sizeof(sockaddr)); return 0; } $ gcc tcp-ao-nullptr.c -o tcp-ao-nullptr -Wall $ unshare -Urn BUG: kernel NULL pointer dereference, address: 00000000000000b6 PGD 1f648d067 P4D 1f648d067 PUD 1982e8067 PMD 0 Oops: Oops: 0000 [#1] SMP NOPTI Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:tcp_ao_finish_connect (net/ipv4/tcp_ao.c:1182) Fixes: 7c2ffaf ("net/tcp: Calculate TCP-AO traffic keys") Signed-off-by: Anderson Nascimento <[email protected]> Reviewed-by: Dmitry Safonov <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

When igc_led_setup() fails, igc_probe() fails and triggers kernel panic in free_netdev() since unregister_netdev() is not called. [1] This behavior can be tested using fault-injection framework, especially the failslab feature. [2] Since LED support is not mandatory, treat LED setup failures as non-fatal and continue probe with a warning message, consequently avoiding the kernel panic. [1] kernel BUG at net/core/dev.c:12047! Oops: invalid opcode: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 937 Comm: repro-igc-led-e Not tainted 6.17.0-rc4-enjuk-tnguy-00865-gc4940196ab02 torvalds#64 PREEMPT(voluntary) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:free_netdev+0x278/0x2b0 [...] Call Trace: <TASK> igc_probe+0x370/0x910 local_pci_probe+0x3a/0x80 pci_device_probe+0xd1/0x200 [...] [2] #!/bin/bash -ex FAILSLAB_PATH=/sys/kernel/debug/failslab/ DEVICE=0000:00:05.0 START_ADDR=$(grep " igc_led_setup" /proc/kallsyms \ | awk '{printf("0x%s", $1)}') END_ADDR=$(printf "0x%x" $((START_ADDR + 0x100))) echo $START_ADDR > $FAILSLAB_PATH/require-start echo $END_ADDR > $FAILSLAB_PATH/require-end echo 1 > $FAILSLAB_PATH/times echo 100 > $FAILSLAB_PATH/probability echo N > $FAILSLAB_PATH/ignore-gfp-wait echo $DEVICE > /sys/bus/pci/drivers/igc/bind Fixes: ea57870 ("igc: Add support for LEDs on i225/i226") Signed-off-by: Kohei Enju <[email protected]> Reviewed-by: Paul Menzel <[email protected]> Reviewed-by: Aleksandr Loktionov <[email protected]> Reviewed-by: Vitaly Lifshits <[email protected]> Reviewed-by: Kurt Kanzenbach <[email protected]> Tested-by: Mor Bar-Gabay <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>

Check in snd_intel_dsp_check_soundwire() that the pointer returned by ACPI_HANDLE() is not NULL, before passing it on to other functions. The original code assumed a non-NULL return, but if it was unexpectedly NULL it would end up passed to acpi_walk_namespace() as the start point, and would result in [ 3.219028] BUG: kernel NULL pointer dereference, address: 0000000000000018 [ 3.219029] #PF: supervisor read access in kernel mode [ 3.219030] #PF: error_code(0x0000) - not-present page [ 3.219031] PGD 0 P4D 0 [ 3.219032] Oops: Oops: 0000 [#1] SMP NOPTI [ 3.219035] CPU: 2 UID: 0 PID: 476 Comm: (udev-worker) Tainted: G S AW E 6.17.0-rc5-test #1 PREEMPT(voluntary) [ 3.219038] Tainted: [S]=CPU_OUT_OF_SPEC, [A]=OVERRIDDEN_ACPI_TABLE, [W]=WARN, [E]=UNSIGNED_MODULE [ 3.219040] RIP: 0010:acpi_ns_walk_namespace+0xb5/0x480 This problem was triggered by a bugged DSDT that the kernel couldn't parse. But it shouldn't be possible to SEGFAULT the kernel just because of some bugs in ACPI. Fixes: 0650857 ("ALSA: hda: add autodetection for SoundWire") Signed-off-by: Richard Fitzgerald <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>

…rnal() A crash was observed with the following output: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 UID: 0 PID: 2899 Comm: syz.2.399 Not tainted 6.17.0-rc5+ #5 PREEMPT(none) RIP: 0010:trace_kprobe_create_internal+0x3fc/0x1440 kernel/trace/trace_kprobe.c:911 Call Trace: <TASK> trace_kprobe_create_cb+0xa2/0xf0 kernel/trace/trace_kprobe.c:1089 trace_probe_create+0xf1/0x110 kernel/trace/trace_probe.c:2246 dyn_event_create+0x45/0x70 kernel/trace/trace_dynevent.c:128 create_or_delete_trace_kprobe+0x5e/0xc0 kernel/trace/trace_kprobe.c:1107 trace_parse_run_command+0x1a5/0x330 kernel/trace/trace.c:10785 vfs_write+0x2b6/0xd00 fs/read_write.c:684 ksys_write+0x129/0x240 fs/read_write.c:738 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x5d/0x2d0 arch/x86/entry/syscall_64.c:94 </TASK> Function kmemdup() may return NULL in trace_kprobe_create_internal(), add check for it's return value. Link: https://lore.kernel.org/all/[email protected]/ Fixes: 33b4e38 ("tracing: kprobe-event: Allocate string buffers from heap") Signed-off-by: Wang Liang <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>

[ Upstream commit c1628c0 ] A crash was observed with the following output: BUG: kernel NULL pointer dereference, address: 0000000000000010 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 2 UID: 0 PID: 92 Comm: osnoise_cpus Not tainted 6.17.0-rc4-00201-gd69eb204c255 torvalds#138 PREEMPT(voluntary) RIP: 0010:bitmap_parselist+0x53/0x3e0 Call Trace: <TASK> osnoise_cpus_write+0x7a/0x190 vfs_write+0xf8/0x410 ? do_sys_openat2+0x88/0xd0 ksys_write+0x60/0xd0 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> This issue can be reproduced by below code: fd=open("/sys/kernel/debug/tracing/osnoise/cpus", O_WRONLY); write(fd, "0-2", 0); When user pass 'count=0' to osnoise_cpus_write(), kmalloc() will return ZERO_SIZE_PTR (16) and cpulist_parse() treat it as a normal value, which trigger the null pointer dereference. Add check for the parameter 'count'. Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: 17f8910 ("tracing/osnoise: Allow arbitrarily long CPU string") Signed-off-by: Wang Liang <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit 513c40e upstream. Avoid below overlapping mappings by using a contiguous non-cacheable buffer. [ 4.077708] DMA-API: stm32_fmc2_nfc 48810000.nand-controller: cacheline tracking EEXIST, overlapping mappings aren't supported [ 4.089103] WARNING: CPU: 1 PID: 44 at kernel/dma/debug.c:568 add_dma_entry+0x23c/0x300 [ 4.097071] Modules linked in: [ 4.100101] CPU: 1 PID: 44 Comm: kworker/u4:2 Not tainted 6.1.82 #1 [ 4.106346] Hardware name: STMicroelectronics STM32MP257F VALID1 SNOR / MB1704 (LPDDR4 Power discrete) + MB1703 + MB1708 (SNOR MB1730) (DT) [ 4.118824] Workqueue: events_unbound deferred_probe_work_func [ 4.124674] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 4.131624] pc : add_dma_entry+0x23c/0x300 [ 4.135658] lr : add_dma_entry+0x23c/0x300 [ 4.139792] sp : ffff800009dbb490 [ 4.143016] x29: ffff800009dbb4a0 x28: 0000000004008022 x27: ffff8000098a6000 [ 4.150174] x26: 0000000000000000 x25: ffff8000099e7000 x24: ffff8000099e7de8 [ 4.157231] x23: 00000000ffffffff x22: 0000000000000000 x21: ffff8000098a6a20 [ 4.164388] x20: ffff000080964180 x19: ffff800009819ba0 x18: 0000000000000006 [ 4.171545] x17: 6361727420656e69 x16: 6c6568636163203a x15: 72656c6c6f72746e [ 4.178602] x14: 6f632d646e616e2e x13: ffff800009832f58 x12: 00000000000004ec [ 4.185759] x11: 00000000000001a4 x10: ffff80000988af58 x9 : ffff800009832f58 [ 4.192916] x8 : 00000000ffffefff x7 : ffff80000988af58 x6 : 80000000fffff000 [ 4.199972] x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000 [ 4.207128] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000812d2c40 [ 4.214185] Call trace: [ 4.216605] add_dma_entry+0x23c/0x300 [ 4.220338] debug_dma_map_sg+0x198/0x350 [ 4.224373] __dma_map_sg_attrs+0xa0/0x110 [ 4.228411] dma_map_sg_attrs+0x10/0x2c [ 4.232247] stm32_fmc2_nfc_xfer.isra.0+0x1c8/0x3fc [ 4.237088] stm32_fmc2_nfc_seq_read_page+0xc8/0x174 [ 4.242127] nand_read_oob+0x1d4/0x8e0 [ 4.245861] mtd_read_oob_std+0x58/0x84 [ 4.249596] mtd_read_oob+0x90/0x150 [ 4.253231] mtd_read+0x68/0xac Signed-off-by: Christophe Kerello <[email protected]> Cc: [email protected] Fixes: 2cd457f ("mtd: rawnand: stm32_fmc2: add STM32 FMC2 NAND flash controller driver") Signed-off-by: Miquel Raynal <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…on memory commit d613f53 upstream. When I did memory failure tests, below panic occurs: page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page)) kernel BUG at include/linux/page-flags.h:616! Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 720 Comm: bash Not tainted 6.10.0-rc1-00195-g148743902568 torvalds#40 RIP: 0010:unpoison_memory+0x2f3/0x590 RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246 RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0 RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000 R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe FS: 00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0 Call Trace: <TASK> unpoison_memory+0x2f3/0x590 simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110 debugfs_attr_write+0x42/0x60 full_proxy_write+0x5b/0x80 vfs_write+0xd5/0x540 ksys_write+0x64/0xe0 do_syscall_64+0xb9/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f08f0314887 RSP: 002b:00007ffece710078 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f08f0314887 RDX: 0000000000000009 RSI: 0000564787a30410 RDI: 0000000000000001 RBP: 0000564787a30410 R08: 000000000000fefe R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009 R13: 00007f08f041b780 R14: 00007f08f0417600 R15: 00007f08f0416a00 </TASK> Modules linked in: hwpoison_inject ---[ end trace 0000000000000000 ]--- RIP: 0010:unpoison_memory+0x2f3/0x590 RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246 RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0 RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000 R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe FS: 00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception ]--- The root cause is that unpoison_memory() tries to check the PG_HWPoison flags of an uninitialized page. So VM_BUG_ON_PAGE(PagePoisoned(page)) is triggered. This can be reproduced by below steps: 1.Offline memory block: echo offline > /sys/devices/system/memory/memory12/state 2.Get offlined memory pfn: page-types -b n -rlN 3.Write pfn to unpoison-pfn echo <pfn> > /sys/kernel/debug/hwpoison/unpoison-pfn This scenario can be identified by pfn_to_online_page() returning NULL. And ZONE_DEVICE pages are never expected, so we can simply fail if pfn_to_online_page() == NULL to fix the bug. Link: https://lkml.kernel.org/r/[email protected] Fixes: f1dd2cd ("mm, memory_hotplug: do not associate hotadded memory to zones until online") Signed-off-by: Miaohe Lin <[email protected]> Suggested-by: David Hildenbrand <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit cce7c15 upstream. The function ceph_process_folio_batch() sets folio_batch entries to NULL, which is an illegal state. Before folio_batch_release() crashes due to this API violation, the function ceph_shift_unused_folios_left() is supposed to remove those NULLs from the array. However, since commit ce80b76 ("ceph: introduce ceph_process_folio_batch() method"), this shifting doesn't happen anymore because the "for" loop got moved to ceph_process_folio_batch(), and now the `i` variable that remains in ceph_writepages_start() doesn't get incremented anymore, making the shifting effectively unreachable much of the time. Later, commit 1551ec6 ("ceph: introduce ceph_submit_write() method") added more preconditions for doing the shift, replacing the `i` check (with something that is still just as broken): - if ceph_process_folio_batch() fails, shifting never happens - if ceph_move_dirty_page_in_page_array() was never called (because ceph_process_folio_batch() has returned early for some of various reasons), shifting never happens - if `processed_in_fbatch` is zero (because ceph_process_folio_batch() has returned early for some of the reasons mentioned above or because ceph_move_dirty_page_in_page_array() has failed), shifting never happens Since those two commits, any problem in ceph_process_folio_batch() could crash the kernel, e.g. this way: BUG: kernel NULL pointer dereference, address: 0000000000000034 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: Oops: 0002 [#1] SMP NOPTI CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es torvalds#714 NONE Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023 Workqueue: writeback wb_workfn (flush-ceph-1) RIP: 0010:folios_put_refs+0x85/0x140 Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 > RSP: 0018:ffffb880af8db778 EFLAGS: 00010207 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003 RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0 RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0 R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000 FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> ceph_writepages_start+0xeb9/0x1410 The crash can be reproduced easily by changing the ceph_check_page_before_write() return value to `-E2BIG`. (Interestingly, the crash happens only if `huge_zero_folio` has already been allocated; without `huge_zero_folio`, is_huge_zero_folio(NULL) returns true and folios_put_refs() skips NULL entries instead of dereferencing them. That makes reproducing the bug somewhat unreliable. See https://lore.kernel.org/[email protected] for a discussion of this detail.) My suggestion is to move the ceph_shift_unused_folios_left() to right after ceph_process_folio_batch() to ensure it always gets called to fix up the illegal folio_batch state. Cc: [email protected] Fixes: ce80b76 ("ceph: introduce ceph_process_folio_batch() method") Link: https://lore.kernel.org/ceph-devel/[email protected]/ Signed-off-by: Max Kellermann <[email protected]> Reviewed-by: Viacheslav Dubeyko <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit e2a10da ] Problem description =================== Lockdep reports a possible circular locking dependency (AB/BA) between &pl->state_mutex and &phy->lock, as follows. phylink_resolve() // acquires &pl->state_mutex -> phylink_major_config() -> phy_config_inband() // acquires &pl->phydev->lock whereas all the other call sites where &pl->state_mutex and &pl->phydev->lock have the locking scheme reversed. Everywhere else, &pl->phydev->lock is acquired at the top level, and &pl->state_mutex at the lower level. A clear example is phylink_bringup_phy(). The outlier is the newly introduced phy_config_inband() and the existing lock order is the correct one. To understand why it cannot be the other way around, it is sufficient to consider phylink_phy_change(), phylink's callback from the PHY device's phy->phy_link_change() virtual method, invoked by the PHY state machine. phy_link_up() and phy_link_down(), the (indirect) callers of phylink_phy_change(), are called with &phydev->lock acquired. Then phylink_phy_change() acquires its own &pl->state_mutex, to serialize changes made to its pl->phy_state and pl->link_config. So all other instances of &pl->state_mutex and &phydev->lock must be consistent with this order. Problem impact ============== I think the kernel runs a serious deadlock risk if an existing phylink_resolve() thread, which results in a phy_config_inband() call, is concurrent with a phy_link_up() or phy_link_down() call, which will deadlock on &pl->state_mutex in phylink_phy_change(). Practically speaking, the impact may be limited by the slow speed of the medium auto-negotiation protocol, which makes it unlikely for the current state to still be unresolved when a new one is detected, but I think the problem is there. Nonetheless, the problem was discovered using lockdep. Proposed solution ================= Practically speaking, the phy_config_inband() requirement of having phydev->lock acquired must transfer to the caller (phylink is the only caller). There, it must bubble up until immediately before &pl->state_mutex is acquired, for the cases where that takes place. Solution details, considerations, notes ======================================= This is the phy_config_inband() call graph: sfp_upstream_ops :: connect_phy() | v phylink_sfp_connect_phy() | v phylink_sfp_config_phy() | | sfp_upstream_ops :: module_insert() | | | v | phylink_sfp_module_insert() | | | | sfp_upstream_ops :: module_start() | | | | | v | | phylink_sfp_module_start() | | | | v v | phylink_sfp_config_optical() phylink_start() | | | phylink_resume() v v | | phylink_sfp_set_config() | | | v v v phylink_mac_initial_config() | phylink_resolve() | | phylink_ethtool_ksettings_set() v v v phylink_major_config() | v phy_config_inband() phylink_major_config() caller #1, phylink_mac_initial_config(), does not acquire &pl->state_mutex nor do its callers. It must acquire &pl->phydev->lock prior to calling phylink_major_config(). phylink_major_config() caller #2, phylink_resolve() acquires &pl->state_mutex, thus also needs to acquire &pl->phydev->lock. phylink_major_config() caller #3, phylink_ethtool_ksettings_set(), is completely uninteresting, because it only calls phylink_major_config() if pl->phydev is NULL (otherwise it calls phy_ethtool_ksettings_set()). We need to change nothing there. Other solutions =============== The lock inversion between &pl->state_mutex and &pl->phydev->lock has occurred at least once before, as seen in commit c718af2 ("net: phylink: fix ethtool -A with attached PHYs"). The solution there was to simply not call phy_set_asym_pause() under the &pl->state_mutex. That cannot be extended to our case though, where the phy_config_inband() call is much deeper inside the &pl->state_mutex section. Fixes: 5fd0f1a ("net: phylink: add negotiation of in-band capabilities") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit b816265 ] 5da3d94 ("PCI: mvebu: Use for_each_of_range() iterator for parsing "ranges"") simplified code by using the for_each_of_range() iterator, but it broke PCI enumeration on Turris Omnia (and probably other mvebu targets). Issue #1: To determine range.flags, of_pci_range_parser_one() uses bus->get_flags(), which resolves to of_bus_pci_get_flags(), which already returns an IORESOURCE bit field, and NOT the original flags from the "ranges" resource. Then mvebu_get_tgt_attr() attempts the very same conversion again. Remove the misinterpretation of range.flags in mvebu_get_tgt_attr(), to restore the intended behavior. Issue #2: The driver needs target and attributes, which are encoded in the raw address values of the "/soc/pcie/ranges" resource. According to of_pci_range_parser_one(), the raw values are stored in range.bus_addr and range.parent_bus_addr, respectively. range.cpu_addr is a translated version of range.parent_bus_addr, and not relevant here. Use the correct range structure member, to extract target and attributes. This restores the intended behavior. Fixes: 5da3d94 ("PCI: mvebu: Use for_each_of_range() iterator for parsing "ranges"") Reported-by: Jan Palus <[email protected]> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220479 Signed-off-by: Klaus Kudielka <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Tested-by: Tony Dinh <[email protected]> Tested-by: Jan Palus <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit 915470e ] If request_irq() in i40e_vsi_request_irq_msix() fails in an iteration later than the first, the error path wants to free the IRQs requested so far. However, it uses the wrong dev_id argument for free_irq(), so it does not free the IRQs correctly and instead triggers the warning: Trying to free already-free IRQ 173 WARNING: CPU: 25 PID: 1091 at kernel/irq/manage.c:1829 __free_irq+0x192/0x2c0 Modules linked in: i40e(+) [...] CPU: 25 UID: 0 PID: 1091 Comm: NetworkManager Not tainted 6.17.0-rc1+ #1 PREEMPT(lazy) Hardware name: [...] RIP: 0010:__free_irq+0x192/0x2c0 [...] Call Trace: <TASK> free_irq+0x32/0x70 i40e_vsi_request_irq_msix.cold+0x63/0x8b [i40e] i40e_vsi_request_irq+0x79/0x80 [i40e] i40e_vsi_open+0x21f/0x2f0 [i40e] i40e_open+0x63/0x130 [i40e] __dev_open+0xfc/0x210 __dev_change_flags+0x1fc/0x240 netif_change_flags+0x27/0x70 do_setlink.isra.0+0x341/0xc70 rtnl_newlink+0x468/0x860 rtnetlink_rcv_msg+0x375/0x450 netlink_rcv_skb+0x5c/0x110 netlink_unicast+0x288/0x3c0 netlink_sendmsg+0x20d/0x430 ____sys_sendmsg+0x3a2/0x3d0 ___sys_sendmsg+0x99/0xe0 __sys_sendmsg+0x8a/0xf0 do_syscall_64+0x82/0x2c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e [...] </TASK> ---[ end trace 0000000000000000 ]--- Use the same dev_id for free_irq() as for request_irq(). I tested this with inserting code to fail intentionally. Fixes: 493fb30 ("i40e: Move q_vectors from pointer to array to array of pointers") Signed-off-by: Michal Schmidt <[email protected]> Reviewed-by: Aleksandr Loktionov <[email protected]> Reviewed-by: Subbaraya Sundeep <[email protected]> Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

The bo/ttm interfaces with kernel memory mapping from dedicated GPU memory. It is not correct to assume that SZ_4K would suffice for page alignment as there are a few hardware platforms that commonly uses non- 4KiB pages - for instance, 16KiB is the most commonly used kernel page size used on Loongson devices (of the LoongArch architecture). Per our testing, Intel Xe/Alchemist/Battlemage families of GPUs works on Loongson platforms so long as "Above 4G Decoding" was enabled and "Resizable BAR" was set to auto in the UEFI firmware settings. Without this fix, the kernel will hang at a kernel BUG(): [ 7.425445] ------------[ cut here ]------------ [ 7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181! [ 7.435330] Oops - BUG[#1]: [ 7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G E 6.13.3-aosc-main-00336-g60829239b300-dirty #3 [ 7.449511] Tainted: [E]=UNSIGNED_MODULE [ 7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 7.467144] Workqueue: events work_for_cpu_fn [ 7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960 [ 7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000 [ 7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001 [ 7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000 [ 7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000 [ 7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001 [ 7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000 [ 7.537855] ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe] [ 7.544893] ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0 [ 7.551639] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 7.557785] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 7.562111] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 7.566870] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 7.577163] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E) [ 7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707) [ 7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000 [ 7.629563] 0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877 [ 7.637519] 000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000 [ 7.645475] ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000 [ 7.653431] 0000000000000001 ffff800002533660 0000000000022022 9000000000234470 [ 7.661386] 90000001010cba28 0000000000001000 0000000000000000 000000000005c300 [ 7.669342] 900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000 [ 7.677298] ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14 [ 7.685254] 0000000000022022 0000000000000000 900000000209c000 90000000010589e0 [ 7.693209] 90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000 [ 7.701165] ... [ 7.703588] Call Trace: [ 7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0 [ 7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe] [ 7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe] [ 7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe] [ 7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe] [ 7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe] [ 7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe] [ 7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe] [ 7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe] [ 7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe] [ 7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe] [ 7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe] [ 7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4 [ 7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40 [ 7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458 [ 7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc [ 7.805591] [<90000000002bacac>] kthread+0x114/0x138 [ 7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4 [ 7.816489] [ 7.817961] Code: 4c000020 29c3e2f9 53ff93ff <002a0001> 0015002c 03400000 02ff8063 29c04077 001500f7 [ 7.827651] [ 7.829140] ---[ end trace 0000000000000000 ]--- Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to `drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to fix the above error. Cc: [email protected] Fixes: 4e03b58 ("drm/xe/uapi: Reject bo creation of unaligned size") Fixes: dd08ebf ("drm/xe: Introduce a new DRM driver for Intel GPUs") Tested-by: Mingcong Bai <[email protected]> Tested-by: Wenbin Fang <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Jianfeng Liu <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Link: https://t.me/c/1109254909/768552 Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kexy Biscuit <[email protected]>

…on 3C6000 series steppings Older steppings of the Loongson 3C6000 series incorrectly report the supported link speeds on their PCIe bridges (device IDs 3c19, 3c29) as only 2.5 GT/s, despite the upstream bus supporting speeds from 2.5 GT/s up to 16 GT/s. As a result, certain PCIe devices would be incorrectly probed as a Gen1- only, even if higher link speeds are supported, harming performance and prevents dynamic link speed functionality from being enabled in drivers such as amdgpu. Manually override the `supported_speeds` field for affected PCIe bridges with those found on the upstream bus to correctly reflect the supported link speeds. This patch is found from AOSC OS[1]. Link: #2 #1 Tested-by: Lain Fearyncess Yang <[email protected]> Tested-by: Mingcong Bai <[email protected]> Tested-by: Ayden Meng <[email protected]> Signed-off-by: Ayden Meng <[email protected]> Signed-off-by: Ziyao <[email protected]> Link: https://lore.kernel.org/loongarch/[email protected]/ Signed-off-by: Mingcong Bai <[email protected]>

While testing my ROCm port for LoongArch and AArch64 (patches pending) on the following platforms: - LoongArch ... - Loongson AC612A0_V1.1 (Loongson 3C6000/S) + AMD Radeon RX 6800 - AArch64 ... - FD30M51 (Phytium FT-D3000) + AMD Radeon RX 7600 - Huawei D920S10 (Huawei Kunpeng 920) + AMD Radeon RX 7600 When HSA_AMD_SVM is enabled, amdgpu would fail to initialise at all on LoongArch (no output): amdgpu 0000:0d:00.0: amdgpu: kiq ring mec 2 pipe 1 q 0 CPU 0 Unable to handle kernel paging request at virtual address ffffffffff800034, era == 9000000001058044, ra == 9000000001058660 Oops[#1]: CPU: 0 UID: 0 PID: 202 Comm: kworker/0:3 Not tainted 6.16.0+ torvalds#103 PREEMPT(full) Hardware name: To be filled by O.E.M.To be fill To be filled by O.E.M.To be fill/To be filled by O.E.M.To be fill, BIOS Loongson-UDK2018-V4.0. Workqueue: events work_for_cpu_fn pc 9000000001058044 ra 9000000001058660 tp 9000000101500000 sp 9000000101503aa0 a0 ffffffffff800000 a1 0000000ffffe0000 a2 0000000000000000 a3 90000001207c58e0 a4 9000000001a4c310 a5 0000000000000001 a6 0000000000000000 a7 0000000000000001 t0 000003ffff800000 t1 0000000000000001 t2 0000040000000000 t3 03ffff0000002000 t4 0000000000000000 t5 0001010101010101 t6 ffff800000000000 t7 0001000000000000 t8 000000000000002f u0 0000000000800000 s9 9000000002026000 s0 90000001207c58e0 s1 0000000000000001 s2 9000000001935c40 s3 0000001000000000 s4 0000000000000001 s5 0000000ffffe0000 s6 0000000000000040 s7 0001000000000001 s8 0001000000000000 ra: 9000000001058660 memmap_init_zone_device+0x120/0x1b0 ERA: 9000000001058044 __init_zone_device_page.constprop.0+0x4/0x1a0 CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) PRMD: 00000004 (PPLV0 +PIE -PWE) EUEN: 00000000 (-FPE -SXE -ASXE -BTE) ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) ESTAT: 00020000 [PIS] (IS= ECode=2 EsubCode=0) BADV: ffffffffff800034 PRID: 0014d010 (Loongson-64bit, Loongson-3C6000/S) Modules linked in: amdgpu(+) vfat fat cfg80211 rfkill 8021q garp stp mrp llc snd_hda_codec_atihdmi snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic drm_client_lib drm_ttm_helper syscopyarea ttm sysfillrect sysimgblt fb_sys_fops drm_panel_backlight_quirks video drm_exec drm_suballoc_helper amdxcp mfd_core drm_buddy gpu_sched drm_display_helper drm_kms_helper cec snd_hda_intel ipmi_ssif snd_intel_dspcfg snd_hda_codec snd_hda_core acpi_ipmi snd_hwdep snd_pcm fb loongson3_cpufreq lcd igc snd_timer ipmi_si spi_loongson_pci spi_loongson_core snd ipmi_devintf soundcore ipmi_msghandler binfmt_misc fuse drm drm_panel_orientation_quirks backlight dm_mod dax nfnetlink Process kworker/0:3 (pid: 202, threadinfo=00000000eb7cd5d6, task=000000004ca22b1b) Stack : 0000000000001440 0000000000000000 ffffffffff800000 0000000000000001 90000000020b5978 9000000101503b38 0000000000000001 0000000000000001 0000000000000000 90000000020b5978 90000000020b3f48 0000000000001440 0000000000000000 90000001207c58e0 90000001207c5970 9000000000575e20 90000000010e2e00 90000000020b3f48 900000000205c238 0000000000000000 00000000000001d3 90000001207c58e0 9000000001958f28 9000000120790848 90000001207b3510 0000000000000000 9000000120780000 9000000120780010 90000001207d6000 90000001207c58e0 90000001015660c8 9000000120780000 0000000000000000 90000000005763a8 90000001207c58e0 00000003ff000000 9000000120780000 ffff80000296b820 900000012078f968 90000001207c6000 ... Call Trace: [<9000000001058044>] __init_zone_device_page.constprop.0+0x4/0x1a0 [<900000000105865c>] memmap_init_zone_device+0x11c/0x1b0 [<9000000000575e1c>] memremap_pages+0x24c/0x7b0 [<90000000005763a4>] devm_memremap_pages+0x24/0x80 [<ffff80000296b81c>] kgd2kfd_init_zone_device+0x11c/0x220 [amdgpu] [<ffff80000265d09c>] amdgpu_device_init+0x27dc/0x2bf0 [amdgpu] [<ffff80000265ece8>] amdgpu_driver_load_kms+0x18/0x90 [amdgpu] [<ffff800002651fbc>] amdgpu_pci_probe+0x22c/0x890 [amdgpu] [<9000000000916adc>] local_pci_probe+0x3c/0xb0 [<90000000002976c8>] work_for_cpu_fn+0x18/0x30 [<900000000029aeb4>] process_one_work+0x164/0x320 [<900000000029b96c>] worker_thread+0x37c/0x4a0 [<90000000002a695c>] kthread+0x12c/0x220 [<9000000001055b64>] ret_from_kernel_thread+0x24/0xc0 [<9000000000237524>] ret_from_kernel_thread_asm+0xc/0x88 Code: 00000000 00000000 0280040d <2980d08d> 02bffc0e 2980c08e 02c0208d 29c0208d 1400004f ---[ end trace 0000000000000000 ]--- Or lock up and/or driver reset during computate tasks, such as when running llama.cpp over ROCm, at which point the compute process must be killed before the reset could complete: amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE amdgpu 0000:0a:00.0: amdgpu: failed to remove hardware queue from MES, doorbell=0x1202 amdgpu 0000:0a:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU reset amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 3 amdgpu 0000:0a:00.0: amdgpu: GPU reset begin! amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE amdgpu 0000:0a:00.0: amdgpu: failed to remove hardware queue from MES, doorbell=0x1004 amdgpu 0000:0a:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU reset amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 2 amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 1 amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 0 amdgpu: Failed to quiesce KFD amdgpu 0000:0a:00.0: amdgpu: Dumping IP State amdgpu 0000:0a:00.0: amdgpu: Dumping IP State Completed amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MODE1 reset amdgpu 0000:0a:00.0: amdgpu: GPU mode1 reset amdgpu 0000:0a:00.0: amdgpu: GPU smu mode1 reset amdgpu 0000:0a:00.0: amdgpu: GPU reset succeeded, trying to resume Disabling the aforementioned option makes the issue go away, though it is unclear whether this is a platform-specific issue or one that lies within the amdkfd code. This patch has been tested on all the aforementioned platform combinations, and sent as an RFC to encourage discussion. Signed-off-by: Zhang Yuhao <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Tested-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Mingcong Bai <[email protected]>

…ocation" When this change was introduced between v6.10.4 and v6.10.5, the Broadcom Tigon3 Ethernet interface (tg3) found on Apple MacBook Pro (15'', Mid 2010) would throw many rcu stall errors during boot up, causing peripherals such as the wireless card to misbehave. [ 24.153855] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 2-.... } 21 jiffies s: 973 root: 0x4/. [ 24.166938] rcu: blocking rcu_node structures (internal RCU debug): [ 24.177800] Sending NMI from CPU 3 to CPUs 2: [ 24.183113] NMI backtrace for cpu 2 [ 24.183119] CPU: 2 PID: 1049 Comm: NetworkManager Not tainted 6.10.5-aosc-main #1 [ 24.183123] Hardware name: Apple Inc. MacBookPro6,2/Mac-F22586C8, BIOS MBP61.88Z.005D.B00.1804100943 04/10/18 [ 24.183125] RIP: 0010:__this_module+0x2d3d1/0x4f310 [tg3] [ 24.183135] Code: c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 89 f6 48 03 77 30 8b 06 <31> f6 31 ff c3 cc cc cc cc 66 0f 1f 44 00 00 90 90 90 90 90 90 90 [ 24.183138] RSP: 0018:ffffbf1a011d75e8 EFLAGS: 00000082 [ 24.183141] RAX: 0000000000000000 RBX: ffffa04ec78f8a00 RCX: 0000000000000000 [ 24.183143] RDX: 0000000000000000 RSI: ffffbf1a00fb007c RDI: ffffa04ec78f8a00 [ 24.183145] RBP: 0000000000000b50 R08: 0000000000000000 R09: 0000000000000000 [ 24.183147] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000216 [ 24.183148] R13: ffffbf1a011d7624 R14: ffffa04ec78f8a08 R15: ffffa04ec78f8b40 [ 24.183151] FS: 00007f4c524b2140(0000) GS:ffffa05007d00000(0000) knlGS:0000000000000000 [ 24.183153] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 24.183155] CR2: 00007f7025eae3e8 CR3: 00000001040f8000 CR4: 00000000000006f0 [ 24.183157] Call Trace: [ 24.183162] <NMI> [ 24.183167] ? nmi_cpu_backtrace+0xbf/0x140 [ 24.183175] ? nmi_cpu_backtrace_handler+0x11/0x20 [ 24.183181] ? nmi_handle+0x61/0x160 [ 24.183186] ? default_do_nmi+0x42/0x110 [ 24.183191] ? exc_nmi+0x1bd/0x290 [ 24.183194] ? end_repeat_nmi+0xf/0x53 [ 24.183203] ? __this_module+0x2d3d1/0x4f310 [tg3] [ 24.183207] ? __this_module+0x2d3d1/0x4f310 [tg3] [ 24.183210] ? __this_module+0x2d3d1/0x4f310 [tg3] [ 24.183213] </NMI> [ 24.183214] <TASK> [ 24.183215] __this_module+0x31828/0x4f310 [tg3] [ 24.183218] ? __this_module+0x2d390/0x4f310 [tg3] [ 24.183221] __this_module+0x398e6/0x4f310 [tg3] [ 24.183225] __this_module+0x3baf8/0x4f310 [tg3] [ 24.183229] __this_module+0x4733f/0x4f310 [tg3] [ 24.183233] ? _raw_spin_unlock_irqrestore+0x25/0x70 [ 24.183237] ? __this_module+0x398e6/0x4f310 [tg3] [ 24.183241] __this_module+0x4b943/0x4f310 [tg3] [ 24.183244] ? delay_tsc+0x89/0xf0 [ 24.183249] ? preempt_count_sub+0x51/0x60 [ 24.183254] __this_module+0x4be4b/0x4f310 [tg3] [ 24.183258] __dev_open+0x103/0x1c0 [ 24.183265] __dev_change_flags+0x1bd/0x230 [ 24.183269] ? rtnl_getlink+0x362/0x400 [ 24.183276] dev_change_flags+0x26/0x70 [ 24.183280] do_setlink+0xe16/0x11f0 [ 24.183286] ? __nla_validate_parse+0x61/0xd40 [ 24.183295] __rtnl_newlink+0x63d/0x9f0 [ 24.183301] ? kmem_cache_alloc_node_noprof+0x12b/0x360 [ 24.183308] ? kmalloc_trace_noprof+0x11e/0x350 [ 24.183312] ? rtnl_newlink+0x2e/0x70 [ 24.183316] rtnl_newlink+0x47/0x70 [ 24.183320] rtnetlink_rcv_msg+0x152/0x400 [ 24.183324] ? __netlink_sendskb+0x68/0x90 [ 24.183329] ? netlink_unicast+0x237/0x290 [ 24.183333] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 24.183336] netlink_rcv_skb+0x5b/0x110 [ 24.183343] netlink_unicast+0x1a4/0x290 [ 24.183347] netlink_sendmsg+0x222/0x4a0 [ 24.183350] ? proc_get_long.constprop.0+0x116/0x210 [ 24.183358] ____sys_sendmsg+0x379/0x3b0 [ 24.183363] ? copy_msghdr_from_user+0x6d/0xb0 [ 24.183368] ___sys_sendmsg+0x86/0xe0 [ 24.183372] ? addrconf_sysctl_forward+0xf3/0x270 [ 24.183378] ? _copy_from_iter+0x8b/0x570 [ 24.183384] ? __pfx_addrconf_sysctl_forward+0x10/0x10 [ 24.183388] ? _raw_spin_unlock+0x19/0x50 [ 24.183392] ? proc_sys_call_handler+0xf3/0x2f0 [ 24.183397] ? trace_hardirqs_on+0x29/0x90 [ 24.183401] ? __fdget+0xc2/0xf0 [ 24.183405] __sys_sendmsg+0x5b/0xc0 [ 24.183410] ? syscall_trace_enter+0x110/0x1b0 [ 24.183416] do_syscall_64+0x64/0x150 [ 24.183423] entry_SYSCALL_64_after_hwframe+0x76/0x7e I have bisected the error to this commit. Reverting it caused no new or perceivable issues on both the MacBook and a Zen4-based laptop. Revert this commit as a workaround. This reverts commit aa162aa. Upstream report: https://bugzilla.kernel.org/show_bug.cgi?id=219390 Signed-off-by: Mingcong Bai <[email protected]> Bug: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kexy Biscuit <[email protected]>

…sizes The bo/ttm interfaces with kernel memory mapping from dedicated GPU memory. It is not correct to assume that SZ_4K would suffice for page alignment as there are a few hardware platforms that commonly uses non-4K pages - for instance, currently, Loongson 3A5000/6000 devices (of the LoongArch architecture) commonly uses 16K kernel pages. Per my testing Intel Xe/Arc families of GPUs works on at least Loongson 3A6000 platforms so long as "Above 4G Decoding" and "Resizable BAR" were enabled in the EFI firmware settings. I tested this patch series on my Loongson XA61200 (3A6000) motherboard with an Intel Arc A750 GPU. Without this fix, the kernel will hang at a kernel BUG(): [ 7.425445] ------------[ cut here ]------------ [ 7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181! [ 7.435330] Oops - BUG[#1]: [ 7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G E 6.13.3-aosc-main-00336-g60829239b300-dirty #3 [ 7.449511] Tainted: [E]=UNSIGNED_MODULE [ 7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 7.467144] Workqueue: events work_for_cpu_fn [ 7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960 [ 7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000 [ 7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001 [ 7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000 [ 7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000 [ 7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001 [ 7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000 [ 7.537855] ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe] [ 7.544893] ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0 [ 7.551639] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 7.557785] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 7.562111] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 7.566870] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 7.577163] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E) [ 7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707) [ 7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000 [ 7.629563] 0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877 [ 7.637519] 000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000 [ 7.645475] ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000 [ 7.653431] 0000000000000001 ffff800002533660 0000000000022022 9000000000234470 [ 7.661386] 90000001010cba28 0000000000001000 0000000000000000 000000000005c300 [ 7.669342] 900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000 [ 7.677298] ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14 [ 7.685254] 0000000000022022 0000000000000000 900000000209c000 90000000010589e0 [ 7.693209] 90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000 [ 7.701165] ... [ 7.703588] Call Trace: [ 7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0 [ 7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe] [ 7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe] [ 7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe] [ 7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe] [ 7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe] [ 7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe] [ 7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe] [ 7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe] [ 7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe] [ 7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe] [ 7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe] [ 7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4 [ 7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40 [ 7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458 [ 7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc [ 7.805591] [<90000000002bacac>] kthread+0x114/0x138 [ 7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4 [ 7.816489] [ 7.817961] Code: 4c000020 29c3e2f9 53ff93ff <002a0001> 0015002c 03400000 02ff8063 29c04077 001500f7 [ 7.827651] [ 7.829140] ---[ end trace 0000000000000000 ]--- Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to `drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to fix the above error. Cc: <[email protected]> Fixes: 4e03b58 ("drm/xe/uapi: Reject bo creation of unaligned size") Fixes: dd08ebf ("drm/xe: Introduce a new DRM driver for Intel GPUs") Tested-by: Mingcong Bai <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> [Mingcong Bai: Resolved a minor merge conflict post-6.16 in drivers/gpu/drm/xe/xe_bo.c] Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Mingcong Bai <[email protected]>

…on 3C6000 series steppings Older steppings of the Loongson 3C6000 series incorrectly report the supported link speeds on their PCIe bridges (device IDs 3c19, 3c29) as only 2.5 GT/s, despite the upstream bus supporting speeds from 2.5 GT/s up to 16 GT/s. As a result, certain PCIe devices would be incorrectly probed as a Gen1- only, even if higher link speeds are supported, harming performance and prevents dynamic link speed functionality from being enabled in drivers such as amdgpu. Manually override the `supported_speeds` field for affected PCIe bridges with those found on the upstream bus to correctly reflect the supported link speeds. This patch is found from AOSC OS[1]. Link: #2 #1 Tested-by: Lain Fearyncess Yang <[email protected]> Tested-by: Mingcong Bai <[email protected]> Tested-by: Ayden Meng <[email protected]> Signed-off-by: Ayden Meng <[email protected]> Signed-off-by: Ziyao <[email protected]> Link: https://lore.kernel.org/loongarch/[email protected]/ Signed-off-by: Mingcong Bai <[email protected]>

While testing my ROCm port for LoongArch and AArch64 (patches pending) on the following platforms: - LoongArch ... - Loongson AC612A0_V1.1 (Loongson 3C6000/S) + AMD Radeon RX 6800 - AArch64 ... - FD30M51 (Phytium FT-D3000) + AMD Radeon RX 7600 - Huawei D920S10 (Huawei Kunpeng 920) + AMD Radeon RX 7600 When HSA_AMD_SVM is enabled, amdgpu would fail to initialise at all on LoongArch (no output): amdgpu 0000:0d:00.0: amdgpu: kiq ring mec 2 pipe 1 q 0 CPU 0 Unable to handle kernel paging request at virtual address ffffffffff800034, era == 9000000001058044, ra == 9000000001058660 Oops[#1]: CPU: 0 UID: 0 PID: 202 Comm: kworker/0:3 Not tainted 6.16.0+ torvalds#103 PREEMPT(full) Hardware name: To be filled by O.E.M.To be fill To be filled by O.E.M.To be fill/To be filled by O.E.M.To be fill, BIOS Loongson-UDK2018-V4.0. Workqueue: events work_for_cpu_fn pc 9000000001058044 ra 9000000001058660 tp 9000000101500000 sp 9000000101503aa0 a0 ffffffffff800000 a1 0000000ffffe0000 a2 0000000000000000 a3 90000001207c58e0 a4 9000000001a4c310 a5 0000000000000001 a6 0000000000000000 a7 0000000000000001 t0 000003ffff800000 t1 0000000000000001 t2 0000040000000000 t3 03ffff0000002000 t4 0000000000000000 t5 0001010101010101 t6 ffff800000000000 t7 0001000000000000 t8 000000000000002f u0 0000000000800000 s9 9000000002026000 s0 90000001207c58e0 s1 0000000000000001 s2 9000000001935c40 s3 0000001000000000 s4 0000000000000001 s5 0000000ffffe0000 s6 0000000000000040 s7 0001000000000001 s8 0001000000000000 ra: 9000000001058660 memmap_init_zone_device+0x120/0x1b0 ERA: 9000000001058044 __init_zone_device_page.constprop.0+0x4/0x1a0 CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) PRMD: 00000004 (PPLV0 +PIE -PWE) EUEN: 00000000 (-FPE -SXE -ASXE -BTE) ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) ESTAT: 00020000 [PIS] (IS= ECode=2 EsubCode=0) BADV: ffffffffff800034 PRID: 0014d010 (Loongson-64bit, Loongson-3C6000/S) Modules linked in: amdgpu(+) vfat fat cfg80211 rfkill 8021q garp stp mrp llc snd_hda_codec_atihdmi snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic drm_client_lib drm_ttm_helper syscopyarea ttm sysfillrect sysimgblt fb_sys_fops drm_panel_backlight_quirks video drm_exec drm_suballoc_helper amdxcp mfd_core drm_buddy gpu_sched drm_display_helper drm_kms_helper cec snd_hda_intel ipmi_ssif snd_intel_dspcfg snd_hda_codec snd_hda_core acpi_ipmi snd_hwdep snd_pcm fb loongson3_cpufreq lcd igc snd_timer ipmi_si spi_loongson_pci spi_loongson_core snd ipmi_devintf soundcore ipmi_msghandler binfmt_misc fuse drm drm_panel_orientation_quirks backlight dm_mod dax nfnetlink Process kworker/0:3 (pid: 202, threadinfo=00000000eb7cd5d6, task=000000004ca22b1b) Stack : 0000000000001440 0000000000000000 ffffffffff800000 0000000000000001 90000000020b5978 9000000101503b38 0000000000000001 0000000000000001 0000000000000000 90000000020b5978 90000000020b3f48 0000000000001440 0000000000000000 90000001207c58e0 90000001207c5970 9000000000575e20 90000000010e2e00 90000000020b3f48 900000000205c238 0000000000000000 00000000000001d3 90000001207c58e0 9000000001958f28 9000000120790848 90000001207b3510 0000000000000000 9000000120780000 9000000120780010 90000001207d6000 90000001207c58e0 90000001015660c8 9000000120780000 0000000000000000 90000000005763a8 90000001207c58e0 00000003ff000000 9000000120780000 ffff80000296b820 900000012078f968 90000001207c6000 ... Call Trace: [<9000000001058044>] __init_zone_device_page.constprop.0+0x4/0x1a0 [<900000000105865c>] memmap_init_zone_device+0x11c/0x1b0 [<9000000000575e1c>] memremap_pages+0x24c/0x7b0 [<90000000005763a4>] devm_memremap_pages+0x24/0x80 [<ffff80000296b81c>] kgd2kfd_init_zone_device+0x11c/0x220 [amdgpu] [<ffff80000265d09c>] amdgpu_device_init+0x27dc/0x2bf0 [amdgpu] [<ffff80000265ece8>] amdgpu_driver_load_kms+0x18/0x90 [amdgpu] [<ffff800002651fbc>] amdgpu_pci_probe+0x22c/0x890 [amdgpu] [<9000000000916adc>] local_pci_probe+0x3c/0xb0 [<90000000002976c8>] work_for_cpu_fn+0x18/0x30 [<900000000029aeb4>] process_one_work+0x164/0x320 [<900000000029b96c>] worker_thread+0x37c/0x4a0 [<90000000002a695c>] kthread+0x12c/0x220 [<9000000001055b64>] ret_from_kernel_thread+0x24/0xc0 [<9000000000237524>] ret_from_kernel_thread_asm+0xc/0x88 Code: 00000000 00000000 0280040d <2980d08d> 02bffc0e 2980c08e 02c0208d 29c0208d 1400004f ---[ end trace 0000000000000000 ]--- Or lock up and/or driver reset during computate tasks, such as when running llama.cpp over ROCm, at which point the compute process must be killed before the reset could complete: amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE amdgpu 0000:0a:00.0: amdgpu: failed to remove hardware queue from MES, doorbell=0x1202 amdgpu 0000:0a:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU reset amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 3 amdgpu 0000:0a:00.0: amdgpu: GPU reset begin! amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE amdgpu 0000:0a:00.0: amdgpu: failed to remove hardware queue from MES, doorbell=0x1004 amdgpu 0000:0a:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU reset amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 2 amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 1 amdgpu 0000:0a:00.0: amdgpu: Failed to evict queue 0 amdgpu: Failed to quiesce KFD amdgpu 0000:0a:00.0: amdgpu: Dumping IP State amdgpu 0000:0a:00.0: amdgpu: Dumping IP State Completed amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:0a:00.0: amdgpu: MODE1 reset amdgpu 0000:0a:00.0: amdgpu: GPU mode1 reset amdgpu 0000:0a:00.0: amdgpu: GPU smu mode1 reset amdgpu 0000:0a:00.0: amdgpu: GPU reset succeeded, trying to resume Disabling the aforementioned option makes the issue go away, though it is unclear whether this is a platform-specific issue or one that lies within the amdkfd code. This patch has been tested on all the aforementioned platform combinations, and sent as an RFC to encourage discussion. Signed-off-by: Zhang Yuhao <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Tested-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Mingcong Bai <[email protected]>

…ocation" When this change was introduced between v6.10.4 and v6.10.5, the Broadcom Tigon3 Ethernet interface (tg3) found on Apple MacBook Pro (15'', Mid 2010) would throw many rcu stall errors during boot up, causing peripherals such as the wireless card to misbehave. [ 24.153855] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 2-.... } 21 jiffies s: 973 root: 0x4/. [ 24.166938] rcu: blocking rcu_node structures (internal RCU debug): [ 24.177800] Sending NMI from CPU 3 to CPUs 2: [ 24.183113] NMI backtrace for cpu 2 [ 24.183119] CPU: 2 PID: 1049 Comm: NetworkManager Not tainted 6.10.5-aosc-main #1 [ 24.183123] Hardware name: Apple Inc. MacBookPro6,2/Mac-F22586C8, BIOS MBP61.88Z.005D.B00.1804100943 04/10/18 [ 24.183125] RIP: 0010:__this_module+0x2d3d1/0x4f310 [tg3] [ 24.183135] Code: c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 89 f6 48 03 77 30 8b 06 <31> f6 31 ff c3 cc cc cc cc 66 0f 1f 44 00 00 90 90 90 90 90 90 90 [ 24.183138] RSP: 0018:ffffbf1a011d75e8 EFLAGS: 00000082 [ 24.183141] RAX: 0000000000000000 RBX: ffffa04ec78f8a00 RCX: 0000000000000000 [ 24.183143] RDX: 0000000000000000 RSI: ffffbf1a00fb007c RDI: ffffa04ec78f8a00 [ 24.183145] RBP: 0000000000000b50 R08: 0000000000000000 R09: 0000000000000000 [ 24.183147] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000216 [ 24.183148] R13: ffffbf1a011d7624 R14: ffffa04ec78f8a08 R15: ffffa04ec78f8b40 [ 24.183151] FS: 00007f4c524b2140(0000) GS:ffffa05007d00000(0000) knlGS:0000000000000000 [ 24.183153] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 24.183155] CR2: 00007f7025eae3e8 CR3: 00000001040f8000 CR4: 00000000000006f0 [ 24.183157] Call Trace: [ 24.183162] <NMI> [ 24.183167] ? nmi_cpu_backtrace+0xbf/0x140 [ 24.183175] ? nmi_cpu_backtrace_handler+0x11/0x20 [ 24.183181] ? nmi_handle+0x61/0x160 [ 24.183186] ? default_do_nmi+0x42/0x110 [ 24.183191] ? exc_nmi+0x1bd/0x290 [ 24.183194] ? end_repeat_nmi+0xf/0x53 [ 24.183203] ? __this_module+0x2d3d1/0x4f310 [tg3] [ 24.183207] ? __this_module+0x2d3d1/0x4f310 [tg3] [ 24.183210] ? __this_module+0x2d3d1/0x4f310 [tg3] [ 24.183213] </NMI> [ 24.183214] <TASK> [ 24.183215] __this_module+0x31828/0x4f310 [tg3] [ 24.183218] ? __this_module+0x2d390/0x4f310 [tg3] [ 24.183221] __this_module+0x398e6/0x4f310 [tg3] [ 24.183225] __this_module+0x3baf8/0x4f310 [tg3] [ 24.183229] __this_module+0x4733f/0x4f310 [tg3] [ 24.183233] ? _raw_spin_unlock_irqrestore+0x25/0x70 [ 24.183237] ? __this_module+0x398e6/0x4f310 [tg3] [ 24.183241] __this_module+0x4b943/0x4f310 [tg3] [ 24.183244] ? delay_tsc+0x89/0xf0 [ 24.183249] ? preempt_count_sub+0x51/0x60 [ 24.183254] __this_module+0x4be4b/0x4f310 [tg3] [ 24.183258] __dev_open+0x103/0x1c0 [ 24.183265] __dev_change_flags+0x1bd/0x230 [ 24.183269] ? rtnl_getlink+0x362/0x400 [ 24.183276] dev_change_flags+0x26/0x70 [ 24.183280] do_setlink+0xe16/0x11f0 [ 24.183286] ? __nla_validate_parse+0x61/0xd40 [ 24.183295] __rtnl_newlink+0x63d/0x9f0 [ 24.183301] ? kmem_cache_alloc_node_noprof+0x12b/0x360 [ 24.183308] ? kmalloc_trace_noprof+0x11e/0x350 [ 24.183312] ? rtnl_newlink+0x2e/0x70 [ 24.183316] rtnl_newlink+0x47/0x70 [ 24.183320] rtnetlink_rcv_msg+0x152/0x400 [ 24.183324] ? __netlink_sendskb+0x68/0x90 [ 24.183329] ? netlink_unicast+0x237/0x290 [ 24.183333] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 24.183336] netlink_rcv_skb+0x5b/0x110 [ 24.183343] netlink_unicast+0x1a4/0x290 [ 24.183347] netlink_sendmsg+0x222/0x4a0 [ 24.183350] ? proc_get_long.constprop.0+0x116/0x210 [ 24.183358] ____sys_sendmsg+0x379/0x3b0 [ 24.183363] ? copy_msghdr_from_user+0x6d/0xb0 [ 24.183368] ___sys_sendmsg+0x86/0xe0 [ 24.183372] ? addrconf_sysctl_forward+0xf3/0x270 [ 24.183378] ? _copy_from_iter+0x8b/0x570 [ 24.183384] ? __pfx_addrconf_sysctl_forward+0x10/0x10 [ 24.183388] ? _raw_spin_unlock+0x19/0x50 [ 24.183392] ? proc_sys_call_handler+0xf3/0x2f0 [ 24.183397] ? trace_hardirqs_on+0x29/0x90 [ 24.183401] ? __fdget+0xc2/0xf0 [ 24.183405] __sys_sendmsg+0x5b/0xc0 [ 24.183410] ? syscall_trace_enter+0x110/0x1b0 [ 24.183416] do_syscall_64+0x64/0x150 [ 24.183423] entry_SYSCALL_64_after_hwframe+0x76/0x7e I have bisected the error to this commit. Reverting it caused no new or perceivable issues on both the MacBook and a Zen4-based laptop. Revert this commit as a workaround. This reverts commit aa162aa. Upstream report: https://bugzilla.kernel.org/show_bug.cgi?id=219390 Signed-off-by: Mingcong Bai <[email protected]> Bug: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kexy Biscuit <[email protected]>

fix: ls2k500sfb build error, gpio_device_find_by_label should be used

909009a

Zeno-sole force-pushed the 6.9-openkylin branch from 2b35f01 to 909009a Compare April 26, 2024 03:26

MingcongBai closed this Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: ls2k500sfb build error, gpio_device_find_by_label should be used #1

fix: ls2k500sfb build error, gpio_device_find_by_label should be used #1

Uh oh!

Zeno-sole commented Apr 25, 2024

Uh oh!

MingcongBai commented Apr 30, 2024

Uh oh!

Uh oh!

fix: ls2k500sfb build error, gpio_device_find_by_label should be used #1

fix: ls2k500sfb build error, gpio_device_find_by_label should be used #1

Uh oh!

Conversation

Zeno-sole commented Apr 25, 2024

Uh oh!

MingcongBai commented Apr 30, 2024

Uh oh!

Uh oh!