ARC: rwlock: remove unused variable #2

kolerov · 2015-08-04T11:40:20Z

No description provided.

Signed-off-by: Yuriy Kolerov <[email protected]>

vineetgarc · 2015-08-04T11:47:25Z

Thx for the fix Yuriy - change pushed .
BTW what configuration are you trying to build - since the affected code would have enabled LLSC but !ARC_STAR_9000923308

kolerov · 2015-08-04T11:59:38Z

Thanks! I try to build allyesconfig for little endian ARC700.

In dev_queue_xmit() net_cls protected with rcu-bh. [ 270.730026] =============================== [ 270.730029] [ INFO: suspicious RCU usage. ] [ 270.730033] 4.2.0-rc3+ foss-for-synopsys-dwc-arc-processors#2 Not tainted [ 270.730036] ------------------------------- [ 270.730040] include/linux/cgroup.h:353 suspicious rcu_dereference_check() usage! [ 270.730041] other info that might help us debug this: [ 270.730043] rcu_scheduler_active = 1, debug_locks = 1 [ 270.730045] 2 locks held by dhclient/748: [ 270.730046] #0: (rcu_read_lock_bh){......}, at: [<ffffffff81682b70>] __dev_queue_xmit+0x50/0x960 [ 270.730085] foss-for-synopsys-dwc-arc-processors#1: (&qdisc_tx_lock){+.....}, at: [<ffffffff81682d60>] __dev_queue_xmit+0x240/0x960 [ 270.730090] stack backtrace: [ 270.730096] CPU: 0 PID: 748 Comm: dhclient Not tainted 4.2.0-rc3+ foss-for-synopsys-dwc-arc-processors#2 [ 270.730098] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011 [ 270.730100] 0000000000000001 ffff8800bafeba58 ffffffff817ad487 0000000000000007 [ 270.730103] ffff880232a0a780 ffff8800bafeba88 ffffffff810ca4f2 ffff88022fb23e00 [ 270.730105] ffff880232a0a780 ffff8800bafebb68 ffff8800bafebb68 ffff8800bafebaa8 [ 270.730108] Call Trace: [ 270.730121] [<ffffffff817ad487>] dump_stack+0x4c/0x65 [ 270.730148] [<ffffffff810ca4f2>] lockdep_rcu_suspicious+0xe2/0x120 [ 270.730153] [<ffffffff816a62d2>] task_cls_state+0x92/0xa0 [ 270.730158] [<ffffffffa00b534f>] cls_cgroup_classify+0x4f/0x120 [cls_cgroup] [ 270.730164] [<ffffffff816aac74>] tc_classify_compat+0x74/0xc0 [ 270.730166] [<ffffffff816ab573>] tc_classify+0x33/0x90 [ 270.730170] [<ffffffffa00bcb0a>] htb_enqueue+0xaa/0x4a0 [sch_htb] [ 270.730172] [<ffffffff81682e26>] __dev_queue_xmit+0x306/0x960 [ 270.730174] [<ffffffff81682b70>] ? __dev_queue_xmit+0x50/0x960 [ 270.730176] [<ffffffff816834a3>] dev_queue_xmit_sk+0x13/0x20 [ 270.730185] [<ffffffff81787770>] dev_queue_xmit+0x10/0x20 [ 270.730187] [<ffffffff8178b91c>] packet_snd.isra.62+0x54c/0x760 [ 270.730190] [<ffffffff8178be25>] packet_sendmsg+0x2f5/0x3f0 [ 270.730203] [<ffffffff81665245>] ? sock_def_readable+0x5/0x190 [ 270.730210] [<ffffffff817b64bb>] ? _raw_spin_unlock+0x2b/0x40 [ 270.730216] [<ffffffff8173bcbc>] ? unix_dgram_sendmsg+0x5cc/0x640 [ 270.730219] [<ffffffff8165f367>] sock_sendmsg+0x47/0x50 [ 270.730221] [<ffffffff8165f42f>] sock_write_iter+0x7f/0xd0 [ 270.730232] [<ffffffff811fd4c7>] __vfs_write+0xa7/0xf0 [ 270.730234] [<ffffffff811fe5b8>] vfs_write+0xb8/0x190 [ 270.730236] [<ffffffff811fe8c2>] SyS_write+0x52/0xb0 [ 270.730239] [<ffffffff817b6bae>] entry_SYSCALL_64_fastpath+0x12/0x76 Signed-off-by: Konstantin Khlebnikov <[email protected]> Signed-off-by: David S. Miller <[email protected]>

The cgroup attaches inode->i_wb via mark_inode_dirty and when set_page_writeback is called, __inc_wb_stat() updates i_wb's stat. So, we need to explicitly call set_page_dirty->__mark_inode_dirty in prior to any writebacking pages. This patch should resolve the following kernel panic reported by Andreas Reis. https://bugzilla.kernel.org/show_bug.cgi?id=101801 --- Comment foss-for-synopsys-dwc-arc-processors#2 from Andreas Reis <[email protected]> --- BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 IP: [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90 PGD 2951ff067 PUD 2df43f067 PMD 0 Oops: 0000 [foss-for-synopsys-dwc-arc-processors#1] PREEMPT SMP Modules linked in: CPU: 7 PID: 10356 Comm: gcc Tainted: G W 4.2.0-1-cu foss-for-synopsys-dwc-arc-processors#1 Hardware name: Gigabyte Technology Co., Ltd. G1.Sniper M5/G1.Sniper M5, BIOS T01 02/03/2015 task: ffff880295044f80 ti: ffff880295140000 task.ti: ffff880295140000 RIP: 0010:[<ffffffff8149deea>] [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90 RSP: 0018:ffff880295143ac8 EFLAGS: 00010082 RAX: 0000000000000003 RBX: ffffea000a526d40 RCX: 0000000000000001 RDX: 0000000000000020 RSI: 0000000000000001 RDI: 0000000000000088 RBP: ffff880295143ae8 R08: 0000000000000000 R09: ffff88008f69bb30 R10: 00000000fffffffa R11: 0000000000000000 R12: 0000000000000088 R13: 0000000000000001 R14: ffff88041d099000 R15: ffff880084a205d0 FS: 00007f8549374700(0000) GS:ffff88042f3c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a8 CR3: 000000033e1d5000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: 0000000000000000 ffffea000a526d40 ffff880084a20738 ffff880084a20750 ffff880295143b48 ffffffff811cc91e ffff880000000000 0000000000000296 0000000000000000 ffff880417090198 0000000000000000 ffffea000a526d40 Call Trace: [<ffffffff811cc91e>] __test_set_page_writeback+0xde/0x1d0 [<ffffffff813fee87>] do_write_data_page+0xe7/0x3a0 [<ffffffff813faeea>] gc_data_segment+0x5aa/0x640 [<ffffffff813fb0b8>] do_garbage_collect+0x138/0x150 [<ffffffff813fb3fe>] f2fs_gc+0x1be/0x3e0 [<ffffffff81405541>] f2fs_balance_fs+0x81/0x90 [<ffffffff813ee357>] f2fs_unlink+0x47/0x1d0 [<ffffffff81239329>] vfs_unlink+0x109/0x1b0 [<ffffffff8123e3d7>] do_unlinkat+0x287/0x2c0 [<ffffffff8123ebc6>] SyS_unlink+0x16/0x20 [<ffffffff81942e2e>] entry_SYSCALL_64_fastpath+0x12/0x71 Code: 41 5e 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 49 89 f5 41 54 49 89 fc 53 48 83 ec 08 65 ff 05 e6 d9 b6 7e <48> 8b 47 20 48 63 ca 65 8b 18 48 63 db 48 01 f3 48 39 cb 7d 0a RIP [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90 RSP <ffff880295143ac8> CR2: 00000000000000a8 ---[ end trace 5132449a58ed93a3 ]--- note: gcc[10356] exited with preempt_count 2 Signed-off-by: Jaegeuk Kim <[email protected]>

Hayes Wang says: ==================== r8152: issues fix v2: Replace patch foss-for-synopsys-dwc-arc-processors#2 with "r8152: fix wakeup settings". v1: These patches are used to fix issues. ==================== Signed-off-by: David S. Miller <[email protected]>

Hayes Wang says: ==================== r8152: device reset v3: For patch foss-for-synopsys-dwc-arc-processors#2, remove cancel_delayed_work(). v2: For patch foss-for-synopsys-dwc-arc-processors#1, remove usb_autopm_get_interface(), usb_autopm_put_interface(), and the checking of intf->condition. For patch foss-for-synopsys-dwc-arc-processors#2, replace the original method with usb_queue_reset_device() to reset the device. v1: Although the driver works normally, we find the device may get all 0xff data when transmitting packets on certain platforms. It would break the device and no packet could be transmitted. The reset is necessary to recover the hw for this situation. ==================== Signed-off-by: David S. Miller <[email protected]>

Nikolay has reported a hang when a memcg reclaim got stuck with the following backtrace: PID: 18308 TASK: ffff883d7c9b0a30 CPU: 1 COMMAND: "rsync" #0 __schedule at ffffffff815ab152 foss-for-synopsys-dwc-arc-processors#1 schedule at ffffffff815ab76e foss-for-synopsys-dwc-arc-processors#2 schedule_timeout at ffffffff815ae5e5 foss-for-synopsys-dwc-arc-processors#3 io_schedule_timeout at ffffffff815aad6a foss-for-synopsys-dwc-arc-processors#4 bit_wait_io at ffffffff815abfc6 foss-for-synopsys-dwc-arc-processors#5 __wait_on_bit at ffffffff815abda5 torvalds#6 wait_on_page_bit at ffffffff8111fd4f foss-for-synopsys-dwc-arc-processors#7 shrink_page_list at ffffffff81135445 foss-for-synopsys-dwc-arc-processors#8 shrink_inactive_list at ffffffff81135845 foss-for-synopsys-dwc-arc-processors#9 shrink_lruvec at ffffffff81135ead foss-for-synopsys-dwc-arc-processors#10 shrink_zone at ffffffff811360c3 foss-for-synopsys-dwc-arc-processors#11 shrink_zones at ffffffff81136eff foss-for-synopsys-dwc-arc-processors#12 do_try_to_free_pages at ffffffff8113712f foss-for-synopsys-dwc-arc-processors#13 try_to_free_mem_cgroup_pages at ffffffff811372be foss-for-synopsys-dwc-arc-processors#14 try_charge at ffffffff81189423 foss-for-synopsys-dwc-arc-processors#15 mem_cgroup_try_charge at ffffffff8118c6f5 foss-for-synopsys-dwc-arc-processors#16 __add_to_page_cache_locked at ffffffff8112137d foss-for-synopsys-dwc-arc-processors#17 add_to_page_cache_lru at ffffffff81121618 foss-for-synopsys-dwc-arc-processors#18 pagecache_get_page at ffffffff8112170b foss-for-synopsys-dwc-arc-processors#19 grow_dev_page at ffffffff811c8297 foss-for-synopsys-dwc-arc-processors#20 __getblk_slow at ffffffff811c91d6 foss-for-synopsys-dwc-arc-processors#21 __getblk_gfp at ffffffff811c92c1 foss-for-synopsys-dwc-arc-processors#22 ext4_ext_grow_indepth at ffffffff8124565c foss-for-synopsys-dwc-arc-processors#23 ext4_ext_create_new_leaf at ffffffff81246ca8 foss-for-synopsys-dwc-arc-processors#24 ext4_ext_insert_extent at ffffffff81246f09 torvalds#25 ext4_ext_map_blocks at ffffffff8124a848 foss-for-synopsys-dwc-arc-processors#26 ext4_map_blocks at ffffffff8121a5b7 torvalds#27 mpage_map_one_extent at ffffffff8121b1fa torvalds#28 mpage_map_and_submit_extent at ffffffff8121f07b foss-for-synopsys-dwc-arc-processors#29 ext4_writepages at ffffffff8121f6d5 foss-for-synopsys-dwc-arc-processors#30 do_writepages at ffffffff8112c490 foss-for-synopsys-dwc-arc-processors#31 __filemap_fdatawrite_range at ffffffff81120199 foss-for-synopsys-dwc-arc-processors#32 filemap_flush at ffffffff8112041c torvalds#33 ext4_alloc_da_blocks at ffffffff81219da1 foss-for-synopsys-dwc-arc-processors#34 ext4_rename at ffffffff81229b91 torvalds#35 ext4_rename2 at ffffffff81229e32 foss-for-synopsys-dwc-arc-processors#36 vfs_rename at ffffffff811a08a5 foss-for-synopsys-dwc-arc-processors#37 SYSC_renameat2 at ffffffff811a3ffc foss-for-synopsys-dwc-arc-processors#38 sys_renameat2 at ffffffff811a408e foss-for-synopsys-dwc-arc-processors#39 sys_rename at ffffffff8119e51e foss-for-synopsys-dwc-arc-processors#40 system_call_fastpath at ffffffff815afa89 Dave Chinner has properly pointed out that this is a deadlock in the reclaim code because ext4 doesn't submit pages which are marked by PG_writeback right away. The heuristic was introduced by commit e62e384 ("memcg: prevent OOM with too many dirty pages") and it was applied only when may_enter_fs was specified. The code has been changed by c3b94f4 ("memcg: further prevent OOM with too many dirty pages") which has removed the __GFP_FS restriction with a reasoning that we do not get into the fs code. But this is not sufficient apparently because the fs doesn't necessarily submit pages marked PG_writeback for IO right away. ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily submit the bio. Instead it tries to map more pages into the bio and mpage_map_one_extent might trigger memcg charge which might end up waiting on a page which is marked PG_writeback but hasn't been submitted yet so we would end up waiting for something that never finishes. Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2) before we go to wait on the writeback. The page fault path, which is the only path that triggers memcg oom killer since 3.12, shouldn't require GFP_NOFS and so we shouldn't reintroduce the premature OOM killer issue which was originally addressed by the heuristic. As per David Chinner the xfs is doing similar thing since 2.6.15 already so ext4 is not the only affected filesystem. Moreover he notes: : For example: IO completion might require unwritten extent conversion : which executes filesystem transactions and GFP_NOFS allocations. The : writeback flag on the pages can not be cleared until unwritten : extent conversion completes. Hence memory reclaim cannot wait on : page writeback to complete in GFP_NOFS context because it is not : safe to do so, memcg reclaim or otherwise. Cc: [email protected] # 3.9+ [[email protected]: corrected the control flow] Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages") Reported-by: Nikolay Borisov <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

The shm implementation internally uses shmem or hugetlbfs inodes for shm segments. As these inodes are never directly exposed to userspace and only accessed through the shm operations which are already hooked by security modules, mark the inodes with the S_PRIVATE flag so that inode security initialization and permission checking is skipped. This was motivated by the following lockdep warning: ====================================================== [ INFO: possible circular locking dependency detected ] 4.2.0-0.rc3.git0.1.fc24.x86_64+debug foss-for-synopsys-dwc-arc-processors#1 Tainted: G W ------------------------------------------------------- httpd/1597 is trying to acquire lock: (&ids->rwsem){+++++.}, at: shm_close+0x34/0x130 but task is already holding lock: (&mm->mmap_sem){++++++}, at: SyS_shmdt+0x4b/0x180 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> foss-for-synopsys-dwc-arc-processors#3 (&mm->mmap_sem){++++++}: lock_acquire+0xc7/0x270 __might_fault+0x7a/0xa0 filldir+0x9e/0x130 xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs] xfs_readdir+0x1b4/0x330 [xfs] xfs_file_readdir+0x2b/0x30 [xfs] iterate_dir+0x97/0x130 SyS_getdents+0x91/0x120 entry_SYSCALL_64_fastpath+0x12/0x76 -> foss-for-synopsys-dwc-arc-processors#2 (&xfs_dir_ilock_class){++++.+}: lock_acquire+0xc7/0x270 down_read_nested+0x57/0xa0 xfs_ilock+0x167/0x350 [xfs] xfs_ilock_attr_map_shared+0x38/0x50 [xfs] xfs_attr_get+0xbd/0x190 [xfs] xfs_xattr_get+0x3d/0x70 [xfs] generic_getxattr+0x4f/0x70 inode_doinit_with_dentry+0x162/0x670 sb_finish_set_opts+0xd9/0x230 selinux_set_mnt_opts+0x35c/0x660 superblock_doinit+0x77/0xf0 delayed_superblock_init+0x10/0x20 iterate_supers+0xb3/0x110 selinux_complete_init+0x2f/0x40 security_load_policy+0x103/0x600 sel_write_load+0xc1/0x750 __vfs_write+0x37/0x100 vfs_write+0xa9/0x1a0 SyS_write+0x58/0xd0 entry_SYSCALL_64_fastpath+0x12/0x76 ... Signed-off-by: Stephen Smalley <[email protected]> Reported-by: Morten Stevens <[email protected]> Acked-by: Hugh Dickins <[email protected]> Acked-by: Paul Moore <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Eric Paris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

It turns out that a PV domU also requires the "Xen PV" APIC driver. Otherwise, the flat driver is used and we get stuck in busy loops that never exit, such as in this stack trace: (gdb) target remote localhost:9999 Remote debugging using localhost:9999 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56 56 while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY) (gdb) bt #0 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56 foss-for-synopsys-dwc-arc-processors#1 __default_send_IPI_shortcut (shortcut=<optimized out>, dest=<optimized out>, vector=<optimized out>) at ./arch/x86/include/asm/ipi.h:75 foss-for-synopsys-dwc-arc-processors#2 apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54 foss-for-synopsys-dwc-arc-processors#3 0xffffffff81011336 in arch_irq_work_raise () at arch/x86/kernel/irq_work.c:47 foss-for-synopsys-dwc-arc-processors#4 0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at kernel/irq_work.c:100 foss-for-synopsys-dwc-arc-processors#5 0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633 torvalds#6 0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>, fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:1778 foss-for-synopsys-dwc-arc-processors#7 0xffffffff816010c8 in printk (fmt=<optimized out>) at kernel/printk/printk.c:1868 foss-for-synopsys-dwc-arc-processors#8 0xffffffffc00013ea in ?? () foss-for-synopsys-dwc-arc-processors#9 0x0000000000000000 in ?? () Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755 Signed-off-by: Jason A. Donenfeld <[email protected]> Cc: <[email protected]> Signed-off-by: David Vrabel <[email protected]>

This patch is based on the upstream commit 5ac1c4b and amended for v4.2 to make sure it works as intended. Repeated calls to begin_crtc_commit can cause warnings like this: [ 169.127746] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:616 [ 169.127835] in_atomic(): 0, irqs_disabled(): 1, pid: 1947, name: kms_flip [ 169.127840] 3 locks held by kms_flip/1947: [ 169.127843] #0: (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffff814774bc>] __drm_modeset_lock_all+0x9c/0x130 [ 169.127860] foss-for-synopsys-dwc-arc-processors#1: (crtc_ww_class_acquire){+.+.+.}, at: [<ffffffff814774cd>] __drm_modeset_lock_all+0xad/0x130 [ 169.127870] foss-for-synopsys-dwc-arc-processors#2: (crtc_ww_class_mutex){+.+.+.}, at: [<ffffffff81477178>] drm_modeset_lock+0x38/0x110 [ 169.127879] irq event stamp: 665690 [ 169.127882] hardirqs last enabled at (665689): [<ffffffff817ffdb5>] _raw_spin_unlock_irqrestore+0x55/0x70 [ 169.127889] hardirqs last disabled at (665690): [<ffffffffc0197a23>] intel_pipe_update_start+0x113/0x5c0 [i915] [ 169.127936] softirqs last enabled at (665470): [<ffffffff8108a766>] __do_softirq+0x236/0x650 [ 169.127942] softirqs last disabled at (665465): [<ffffffff8108ae75>] irq_exit+0xc5/0xd0 [ 169.127951] CPU: 1 PID: 1947 Comm: kms_flip Not tainted 4.1.0-rc4-patser+ #4039 [ 169.127954] Hardware name: LENOVO 2349AV8/2349AV8, BIOS G1ETA5WW (2.65 ) 04/15/2014 [ 169.127957] ffff8800c49036f0 ffff8800cde5fa28 ffffffff817f6907 0000000080000001 [ 169.127964] 0000000000000000 ffff8800cde5fa58 ffffffff810aebed 0000000000000046 [ 169.127970] ffffffff81c5d518 0000000000000268 0000000000000000 ffff8800cde5fa88 [ 169.127981] Call Trace: [ 169.127992] [<ffffffff817f6907>] dump_stack+0x4f/0x7b [ 169.128001] [<ffffffff810aebed>] ___might_sleep+0x16d/0x270 [ 169.128008] [<ffffffff810aed38>] __might_sleep+0x48/0x90 [ 169.128017] [<ffffffff817fc359>] mutex_lock_nested+0x29/0x410 [ 169.128073] [<ffffffffc01635f0>] ? vgpu_write64+0x220/0x220 [i915] [ 169.128138] [<ffffffffc017fddf>] ? ironlake_update_primary_plane+0x2ff/0x410 [i915] [ 169.128198] [<ffffffffc0190e75>] intel_frontbuffer_flush+0x25/0x70 [i915] [ 169.128253] [<ffffffffc01831ac>] intel_finish_crtc_commit+0x4c/0x180 [i915] [ 169.128279] [<ffffffffc00784ac>] drm_atomic_helper_commit_planes+0x12c/0x240 [drm_kms_helper] [ 169.128338] [<ffffffffc0184264>] __intel_set_mode+0x684/0x830 [i915] [ 169.128378] [<ffffffffc018a84a>] intel_crtc_set_config+0x49a/0x620 [i915] [ 169.128385] [<ffffffff817fdd39>] ? mutex_unlock+0x9/0x10 [ 169.128391] [<ffffffff81467b69>] drm_mode_set_config_internal+0x69/0x120 [ 169.128398] [<ffffffff8119b547>] ? might_fault+0x57/0xb0 [ 169.128403] [<ffffffff8146bf93>] drm_mode_setcrtc+0x253/0x620 [ 169.128409] [<ffffffff8145c600>] drm_ioctl+0x1a0/0x6a0 [ 169.128415] [<ffffffff810b3b41>] ? get_parent_ip+0x11/0x50 [ 169.128424] [<ffffffff811e9ab8>] do_vfs_ioctl+0x2f8/0x530 [ 169.128429] [<ffffffff810d0fcd>] ? trace_hardirqs_on+0xd/0x10 [ 169.128435] [<ffffffff812e7676>] ? selinux_file_ioctl+0x56/0x100 [ 169.128439] [<ffffffff811e9d71>] SyS_ioctl+0x81/0xa0 [ 169.128445] [<ffffffff81800697>] system_call_fastpath+0x12/0x6f Solve it by using the newly introduced drm_atomic_helper_commit_planes_on_crtc. The problem here was that the drm_atomic_helper_commit_planes() helper we were using was basically designed to do begin_crtc_commit(crtc foss-for-synopsys-dwc-arc-processors#1) begin_crtc_commit(crtc foss-for-synopsys-dwc-arc-processors#2) ... commit all planes finish_crtc_commit(crtc foss-for-synopsys-dwc-arc-processors#1) finish_crtc_commit(crtc foss-for-synopsys-dwc-arc-processors#2) The problem here is that since our hardware relies on vblank evasion, our CRTC 'begin' function waits until we're out of the danger zone in which register writes might wind up straddling the vblank, then disables interrupts; our 'finish' function re-enables interrupts after the registers have been written. The expectation is that the operations between 'begin' and 'end' must be performed without sleeping (since interrupts are disabled) and should happen as quickly as possible. By clumping all of the 'begin' calls together, we introducing a couple problems: * Subsequent 'begin' invocations might sleep (which is illegal) * The first 'begin' ensured that we were far enough from the vblank that we could write our registers safely and ensure they all fell within the same frame. Adding extra delay waiting for subsequent CRTC's wasn't accounted for and could put us back into the 'danger zone' for CRTC foss-for-synopsys-dwc-arc-processors#1. This commit solves the problem by using a new helper that allows an order of operations like: for each crtc { begin_crtc_commit(crtc) // sleep (maybe), then disable interrupts commit planes for this specific CRTC end_crtc_commit(crtc) // reenable interrupts } so that sleeps will only be performed while interrupts are enabled and we can be sure that registers for a CRTC will be written immediately once we know we're in the safe zone. The crtc->config->base.crtc update may seem unrelated, but the helper will use it to obtain the crtc for the state. Without the update it will dereference NULL and crash. Changes since v1: - Use Matt Roper's commit message. Signed-off-by: Maarten Lankhorst <[email protected]> Reviewed-by: Matt Roper <[email protected]> References: https://bugs.freedesktop.org/show_bug.cgi?id=90398 Reviewed-by: Ander Conselvan de Oliveira <[email protected]> Signed-off-by: Jani Nikula <[email protected]>

A recent change to the cpu_cooling code introduced a AB-BA deadlock scenario between the cpufreq_policy_notifier_list rwsem and the cooling_cpufreq_lock. This is caused by cooling_cpufreq_lock being held before the registration/removal of the notifier block (an operation which takes the rwsem), and the notifier code itself which takes the locks in the reverse order: ====================================================== [ INFO: possible circular locking dependency detected ] 3.18.0+ #1453 Not tainted ------------------------------------------------------- rc.local/770 is trying to acquire lock: (cooling_cpufreq_lock){+.+.+.}, at: [<c04abfc4>] cpufreq_thermal_notifier+0x34/0xfc but task is already holding lock: ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [<c0042f04>] __blocking_notifier_call_chain+0x34/0x68 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> foss-for-synopsys-dwc-arc-processors#1 ((cpufreq_policy_notifier_list).rwsem){++++.+}: [<c06bc3b0>] down_write+0x44/0x9c [<c0043444>] blocking_notifier_chain_register+0x28/0xd8 [<c04ad610>] cpufreq_register_notifier+0x68/0x90 [<c04abe4c>] __cpufreq_cooling_register.part.1+0x120/0x180 [<c04abf44>] __cpufreq_cooling_register+0x98/0xa4 [<c04abf8c>] cpufreq_cooling_register+0x18/0x1c [<bf0046f8>] imx_thermal_probe+0x1c0/0x470 [imx_thermal] [<c037cef8>] platform_drv_probe+0x50/0xac [<c037b710>] driver_probe_device+0x114/0x234 [<c037b8cc>] __driver_attach+0x9c/0xa0 [<c0379d68>] bus_for_each_dev+0x5c/0x90 [<c037b204>] driver_attach+0x24/0x28 [<c037ae7c>] bus_add_driver+0xe0/0x1d8 [<c037c0cc>] driver_register+0x80/0xfc [<c037cd80>] __platform_driver_register+0x50/0x64 [<bf007018>] 0xbf007018 [<c0008a5c>] do_one_initcall+0x88/0x1d8 [<c0095da4>] load_module+0x1768/0x1ef8 [<c0096614>] SyS_init_module+0xe0/0xf4 [<c000ec00>] ret_fast_syscall+0x0/0x48 -> #0 (cooling_cpufreq_lock){+.+.+.}: [<c00619f8>] lock_acquire+0xb0/0x124 [<c06ba3b4>] mutex_lock_nested+0x5c/0x3d8 [<c04abfc4>] cpufreq_thermal_notifier+0x34/0xfc [<c0042bf4>] notifier_call_chain+0x4c/0x8c [<c0042f20>] __blocking_notifier_call_chain+0x50/0x68 [<c0042f58>] blocking_notifier_call_chain+0x20/0x28 [<c04ae62c>] cpufreq_set_policy+0x7c/0x1d0 [<c04af3cc>] store_scaling_governor+0x74/0x9c [<c04ad418>] store+0x90/0xc0 [<c0175384>] sysfs_kf_write+0x54/0x58 [<c01746b4>] kernfs_fop_write+0xdc/0x190 [<c010dcc0>] vfs_write+0xac/0x1b4 [<c010dfec>] SyS_write+0x44/0x90 [<c000ec00>] ret_fast_syscall+0x0/0x48 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((cpufreq_policy_notifier_list).rwsem); lock(cooling_cpufreq_lock); lock((cpufreq_policy_notifier_list).rwsem); lock(cooling_cpufreq_lock); *** DEADLOCK *** 7 locks held by rc.local/770: #0: (sb_writers#6){.+.+.+}, at: [<c010dda0>] vfs_write+0x18c/0x1b4 foss-for-synopsys-dwc-arc-processors#1: (&of->mutex){+.+.+.}, at: [<c0174678>] kernfs_fop_write+0xa0/0x190 foss-for-synopsys-dwc-arc-processors#2: (s_active#52){.+.+.+}, at: [<c0174680>] kernfs_fop_write+0xa8/0x190 foss-for-synopsys-dwc-arc-processors#3: (cpu_hotplug.lock){++++++}, at: [<c0026a60>] get_online_cpus+0x34/0x90 foss-for-synopsys-dwc-arc-processors#4: (cpufreq_rwsem){.+.+.+}, at: [<c04ad3e0>] store+0x58/0xc0 foss-for-synopsys-dwc-arc-processors#5: (&policy->rwsem){+.+.+.}, at: [<c04ad3f8>] store+0x70/0xc0 torvalds#6: ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [<c0042f04>] __blocking_notifier_call_chain+0x34/0x68 stack backtrace: CPU: 0 PID: 770 Comm: rc.local Not tainted 3.18.0+ #1453 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Backtrace: [<c00121c8>] (dump_backtrace) from [<c0012360>] (show_stack+0x18/0x1c) r6:c0b85a80 r5:c0b75630 r4:00000000 r3:00000000 [<c0012348>] (show_stack) from [<c06b6c48>] (dump_stack+0x7c/0x98) [<c06b6bcc>] (dump_stack) from [<c06b42a4>] (print_circular_bug+0x28c/0x2d8) r4:c0b85a80 r3:d0071d40 [<c06b4018>] (print_circular_bug) from [<c00613b0>] (__lock_acquire+0x1acc/0x1bb0) r10:c0b50660 r8:c09e6d80 r7:d0071d40 r6:c11d0f0c r5:00000007 r4:d0072240 [<c005f8e4>] (__lock_acquire) from [<c00619f8>] (lock_acquire+0xb0/0x124) r10:00000000 r9:c04abfc4 r8:00000000 r7:00000000 r6:00000000 r5:c0a06f0c r4:00000000 [<c0061948>] (lock_acquire) from [<c06ba3b4>] (mutex_lock_nested+0x5c/0x3d8) r10:ec853800 r9:c0a06ed4 r8:d0071d40 r7:c0a06ed4 r6:c11d0f0c r5:00000000 r4:c04abfc4 [<c06ba358>] (mutex_lock_nested) from [<c04abfc4>] (cpufreq_thermal_notifier+0x34/0xfc) r10:ec853800 r9:ec85380c r8:d00d7d3c r7:c0a06ed4 r6:d00d7d3c r5:00000000 r4:fffffffe [<c04abf90>] (cpufreq_thermal_notifier) from [<c0042bf4>] (notifier_call_chain+0x4c/0x8c) r7:00000000 r6:00000000 r5:00000000 r4:fffffffe [<c0042ba8>] (notifier_call_chain) from [<c0042f20>] (__blocking_notifier_call_chain+0x50/0x68) r8:c0a072a4 r7:00000000 r6:d00d7d3c r5:ffffffff r4:c0a06fc8 r3:ffffffff [<c0042ed0>] (__blocking_notifier_call_chain) from [<c0042f58>] (blocking_notifier_call_chain+0x20/0x28) r7:ec98b540 r6:c13ebc80 r5:ed76e600 r4:d00d7d3c [<c0042f38>] (blocking_notifier_call_chain) from [<c04ae62c>] (cpufreq_set_policy+0x7c/0x1d0) [<c04ae5b0>] (cpufreq_set_policy) from [<c04af3cc>] (store_scaling_governor+0x74/0x9c) r7:ec98b540 r6:0000000c r5:ec98b540 r4:ed76e600 [<c04af358>] (store_scaling_governor) from [<c04ad418>] (store+0x90/0xc0) r6:0000000c r5:ed76e6d4 r4:ed76e600 [<c04ad388>] (store) from [<c0175384>] (sysfs_kf_write+0x54/0x58) r8:0000000c r7:d00d7f78 r6:ec98b540 r5:0000000c r4:ec853800 r3:0000000c [<c0175330>] (sysfs_kf_write) from [<c01746b4>] (kernfs_fop_write+0xdc/0x190) r6:ec98b540 r5:00000000 r4:00000000 r3:c0175330 [<c01745d8>] (kernfs_fop_write) from [<c010dcc0>] (vfs_write+0xac/0x1b4) r10:0162aa70 r9:d00d6000 r8:0000000c r7:d00d7f78 r6:0162aa70 r5:0000000c r4:eccde500 [<c010dc14>] (vfs_write) from [<c010dfec>] (SyS_write+0x44/0x90) r10:0162aa70 r8:0000000c r7:eccde500 r6:eccde500 r5:00000000 r4:00000000 [<c010dfa8>] (SyS_write) from [<c000ec00>] (ret_fast_syscall+0x0/0x48) r10:00000000 r8:c000edc4 r7:00000004 r6:000216cc r5:0000000c r4:0162aa70 Solve this by moving to finer grained locking - use one mutex to protect the cpufreq_dev_list as a whole, and a separate lock to ensure correct ordering of cpufreq notifier registration and removal. cooling_list_lock is taken within cooling_cpufreq_lock on (un)registration to preserve the behavior of the code, i.e. to atomically add/remove to the list and (un)register the notifier. Fixes: 2dcd851 ("thermal: cpu_cooling: Update always cpufreq policy with Reviewed-by: Viresh Kumar <[email protected]> Signed-off-by: Russell King <[email protected]> Signed-off-by: Viresh Kumar <[email protected]> Signed-off-by: Eduardo Valentin <[email protected]>

My colleague ran into a program stall on a x86_64 server, where n_tty_read() was waiting for data even if there was data in the buffer in the pty. kernel stack for the stuck process looks like below. #0 [ffff88303d107b58] __schedule at ffffffff815c4b20 foss-for-synopsys-dwc-arc-processors#1 [ffff88303d107bd0] schedule at ffffffff815c513e foss-for-synopsys-dwc-arc-processors#2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818 foss-for-synopsys-dwc-arc-processors#3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2 foss-for-synopsys-dwc-arc-processors#4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23 foss-for-synopsys-dwc-arc-processors#5 [ffff88303d107dd0] tty_read at ffffffff81368013 torvalds#6 [ffff88303d107e20] __vfs_read at ffffffff811a3704 foss-for-synopsys-dwc-arc-processors#7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57 foss-for-synopsys-dwc-arc-processors#8 [ffff88303d107f00] sys_read at ffffffff811a4306 foss-for-synopsys-dwc-arc-processors#9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7 There seems to be two problems causing this issue. First, in drivers/tty/n_tty.c, __receive_buf() stores the data and updates ldata->commit_head using smp_store_release() and then checks the wait queue using waitqueue_active(). However, since there is no memory barrier, __receive_buf() could return without calling wake_up_interactive_poll(), and at the same time, n_tty_read() could start to wait in wait_woken() as in the following chart. __receive_buf() n_tty_read() ------------------------------------------------------------------------ if (waitqueue_active(&tty->read_wait)) /* Memory operations issued after the RELEASE may be completed before the RELEASE operation has completed */ add_wait_queue(&tty->read_wait, &wait); ... if (!input_available_p(tty, 0)) { smp_store_release(&ldata->commit_head, ldata->read_head); ... timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, timeout); ------------------------------------------------------------------------ The second problem is that n_tty_read() also lacks a memory barrier call and could also cause __receive_buf() to return without calling wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken() as in the chart below. __receive_buf() n_tty_read() ------------------------------------------------------------------------ spin_lock_irqsave(&q->lock, flags); /* from add_wait_queue() */ ... if (!input_available_p(tty, 0)) { /* Memory operations issued after the RELEASE may be completed before the RELEASE operation has completed */ smp_store_release(&ldata->commit_head, ldata->read_head); if (waitqueue_active(&tty->read_wait)) __add_wait_queue(q, wait); spin_unlock_irqrestore(&q->lock,flags); /* from add_wait_queue() */ ... timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, timeout); ------------------------------------------------------------------------ There are also other places in drivers/tty/n_tty.c which have similar calls to waitqueue_active(), so instead of adding many memory barrier calls, this patch simply removes the call to waitqueue_active(), leaving just wake_up*() behind. This fixes both problems because, even though the memory access before or after the spinlocks in both wake_up*() and add_wait_queue() can sneak into the critical section, it cannot go past it and the critical section assures that they will be serialized (please see "INTER-CPU ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a better explanation). Moreover, the resulting code is much simpler. Latency measurement using a ping-pong test over a pty doesn't show any visible performance drop. Signed-off-by: Kosuke Tatsukawa <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

When running kprobe test on arm64 rt kernel, it reports the below warning: root@qemu7:~# modprobe kprobe_example BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917 in_atomic(): 0, irqs_disabled(): 128, pid: 484, name: modprobe CPU: 0 PID: 484 Comm: modprobe Not tainted 4.1.6-rt5 foss-for-synopsys-dwc-arc-processors#2 Hardware name: linux,dummy-virt (DT) Call trace: [<ffffffc0000891b8>] dump_backtrace+0x0/0x128 [<ffffffc000089300>] show_stack+0x20/0x30 [<ffffffc00061dae8>] dump_stack+0x1c/0x28 [<ffffffc0000bbad0>] ___might_sleep+0x120/0x198 [<ffffffc0006223e8>] rt_spin_lock+0x28/0x40 [<ffffffc000622b30>] __aarch64_insn_write+0x28/0x78 [<ffffffc000622e48>] aarch64_insn_patch_text_nosync+0x18/0x48 [<ffffffc000622ee8>] aarch64_insn_patch_text_cb+0x70/0xa0 [<ffffffc000622f40>] aarch64_insn_patch_text_sync+0x28/0x48 [<ffffffc0006236e0>] arch_arm_kprobe+0x38/0x48 [<ffffffc00010e6f4>] arm_kprobe+0x34/0x50 [<ffffffc000110374>] register_kprobe+0x4cc/0x5b8 [<ffffffbffc002038>] kprobe_init+0x38/0x7c [kprobe_example] [<ffffffc000084240>] do_one_initcall+0x90/0x1b0 [<ffffffc00061c498>] do_init_module+0x6c/0x1cc [<ffffffc0000fd0c0>] load_module+0x17f8/0x1db0 [<ffffffc0000fd8cc>] SyS_finit_module+0xb4/0xc8 Convert patch_lock to raw loc kto avoid this issue. Although the problem is found on rt kernel, the fix should be applicable to mainline kernel too. Acked-by: Steven Rostedt <[email protected]> Signed-off-by: Yang Shi <[email protected]> Signed-off-by: Will Deacon <[email protected]>

When freqdomain_cpus attribute is read from an offlined cpu, it will cause crash. This change prevents calling cpufreq_show_cpus when policy driver_data is NULL. Crash info: [ 170.814949] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 170.814990] IP: [<ffffffff813b2490>] _find_next_bit.part.0+0x10/0x70 [ 170.815021] PGD 227d30067 PUD 229e56067 PMD 0 [ 170.815043] Oops: 0000 [foss-for-synopsys-dwc-arc-processors#2] SMP [ 170.816022] CPU: 3 PID: 3121 Comm: cat Tainted: G D OE 4.3.0-rc3+ torvalds#33 ... ... [ 170.816657] Call Trace: [ 170.816672] [<ffffffff813b2505>] ? find_next_bit+0x15/0x20 [ 170.816696] [<ffffffff8160e47c>] cpufreq_show_cpus+0x5c/0xd0 [ 170.816722] [<ffffffffa031a409>] show_freqdomain_cpus+0x19/0x20 [acpi_cpufreq] [ 170.816749] [<ffffffff8160e65b>] show+0x3b/0x60 [ 170.816769] [<ffffffff8129b31c>] sysfs_kf_seq_show+0xbc/0x130 [ 170.816793] [<ffffffff81299be3>] kernfs_seq_show+0x23/0x30 [ 170.816816] [<ffffffff81240f2c>] seq_read+0xec/0x390 [ 170.816837] [<ffffffff8129a64a>] kernfs_fop_read+0x10a/0x160 [ 170.816861] [<ffffffff8121d9b7>] __vfs_read+0x37/0x100 [ 170.816883] [<ffffffff813217c0>] ? security_file_permission+0xa0/0xc0 [ 170.816909] [<ffffffff8121e2e3>] vfs_read+0x83/0x130 [ 170.816930] [<ffffffff8121f035>] SyS_read+0x55/0xc0 ... ... [ 170.817185] ---[ end trace bc6eadf82b2b965a ]--- Signed-off-by: Srinivas Pandruvada <[email protected]> Acked-by: Viresh Kumar <[email protected]> Cc: 4.2+ <[email protected]> # 4.2+ Signed-off-by: Rafael J. Wysocki <[email protected]>

…cessors#2 Explicit'ify that all memory added so far is low memory Nothing semantical Signed-off-by: Vineet Gupta <[email protected]>

Saeed Mahameed says: ==================== mlx5 improved flow steering management First two patches fixes some minor issues in recently introduced SRIOV code. The other seven patches modifies the driver's code that manages flow steering rules with Connectx-4 devices. Basic introduction: The flow steering device specification model is composed of the following entities: Destination (either a TIR/Flow table/vport), where TIR is RSS end-point, vport is the VF eSwitch port in SRIOV. Flow table entry (FTE) - the values used by the flow specification Flow table group (FG) - the masks used by the flow specification Flow table (FT) - groups several FGs and can serve as destination The flow steering software entities: In addition to the device objects, the software have two more objects: Priorities - group several FTs. Handles order of packet matching. Namespaces - group several priorities. Namespace are used in order to isolate different usages of steering (for example, add two separate namespaces, one for the NIC driver and one for E-Switch FDB). The base data structure for the flow steering management is a tree and all the flow steering objects such as (Namespace/Flow table/Flow Group/FTE/etc.) are represented as a node in the tree, e.g.: Priority-0 -> FT1 -> FG -> FTE -> TIR (destination) Priority-1 -> FT2 -> FG-> FTE -> TIR (destination) Matching begins in FT1 flow rules and if there is a miss on all the FTEs then matching continues on the FTEs in FT2. The new implementation solves/improves the following issues in the current code: 1) The new impl. supports multiple destinations, the search for existing rule with the same matching value is performed by the flow steering management. In the current impl. the E-switch FDB management code needs to search for existing rules before calling to the add rule function. 2) The new impl. manages the flow table level, in the current implementation the consumer states the flow table level when new flow table is created without any knowledge about the levels of other flow tables. 3) In the current impl. the consumer can't create or destroy flow groups dynamically, the flow groups are passed as argument to the create flow table API. The new impl. exposes API for create/destroy flow group. The series is built as follows: Patch #1 add flow steering API firmware commands. Patch #2 add tree operation of the flow steering tree: add/remove node, initialize node and take reference count on a node. Patch #3 add essential algorithms for managing the flow steering. Patch #4 Initialize the flow steering tree, flow steering initialization is based on static tree which illustrates the flow steering tree when the driver is loaded. Patch #5 is the main patch of the series. It introduce the flow steering API. Patch torvalds#6 Expose the new flow steering API and remove the old one. The Ethernet flow steering follows the existing implementation, but uses the new steering API. Patch #7 Rename en_flow_table.c to en_fs.c in order to be aligned with the new flow steering files. ==================== Signed-off-by: David S. Miller <[email protected]>

This was the second perf intr issue perf sampling on multicore requires intr to be enabled on all cores. ARC perf probe code used helper arc_request_percpu_irq() which calls - request_percpu_irq() on core0 - enable_percpu_irq() on all all cores (including core0) genirq requires that request be made ahead of enable call. However if perf probe happened on non core0 (observed on a 3.18 kernel), enable would get called ahead of request, failing obviously and rendering perf intr disabled on all such cores [ 11.120000] 1 ARC perf : 8 counters (48 bits), 113 conditions, [overflow IRQ support] [ 11.130000] 1 -----> enable_percpu_irq() IRQ 20 failed [ 11.140000] 3 -----> enable_percpu_irq() IRQ 20 failed [ 11.140000] 2 -----> enable_percpu_irq() IRQ 20 failed [ 11.140000] 0 =====> request_percpu_irq() IRQ 20 [ 11.140000] 0 -----> enable_percpu_irq() IRQ 20 Fix this fragility, by calling request_percpu_irq() on whatever core calls probe (there is no requirement on which core calls this anyways) and then calling enable on each cores. Interestingly this started as invesigation of STAR 9000838902: "sporadically IRQs enabled on perf prob" which was about occassional boot spew as request_percpu_irq got called non-locally (from an IPI), and re-enabled interrupts in following path proc_mkdir -> spin_unlock_irq() which the irq work code didn't like. | ARC perf : 8 counters (48 bits), 113 conditions, [overflow IRQ support] | | BUG: failure at ../kernel/irq_work.c:135/irq_work_run_list()! | CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.10-01127-g285efb8e66d1 #2 | | Stack Trace: | arc_unwind_core.constprop.1+0x94/0x104 | dump_stack+0x62/0x98 | irq_work_run_list+0xb0/0xb4 | irq_work_run+0x22/0x3c | do_IPI+0x74/0x9c | handle_irq_event_percpu+0x34/0x164 | handle_percpu_irq+0x58/0x78 | generic_handle_irq+0x1e/0x2c | arch_do_IRQ+0x3c/0x60 | ret_from_exception+0x0/0x8 Cc: Marc Zyngier <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Alexey Brodkin <[email protected]> Cc: <[email protected]> #4.2+ Signed-off-by: Vineet Gupta <[email protected]>

When using the Promise TX2+ SATA controller on PA-RISC, the system often crashes with kernel panic, for example just writing data with the dd utility will make it crash. Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2 Backtrace: [<000000004021497c>] show_stack+0x14/0x20 [<0000000040410bf0>] dump_stack+0x88/0x100 [<000000004023978c>] panic+0x124/0x360 [<0000000040452c18>] sba_alloc_range+0x698/0x6a0 [<0000000040453150>] sba_map_sg+0x260/0x5b8 [<000000000c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata] [<000000000c19535c>] ata_scsi_translate+0xe4/0x220 [libata] [<000000000c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata] [<0000000040499bbc>] scsi_dispatch_cmd+0xfc/0x130 [<000000004049da34>] scsi_request_fn+0x6e4/0x970 [<00000000403e95a8>] __blk_run_queue+0x40/0x60 [<00000000403e9d8c>] blk_run_queue+0x3c/0x68 [<000000004049a534>] scsi_run_queue+0x2a4/0x360 [<000000004049be68>] scsi_end_request+0x1a8/0x238 [<000000004049de84>] scsi_io_completion+0xfc/0x688 [<0000000040493c74>] scsi_finish_command+0x17c/0x1d0 The cause of the crash is not exhaustion of the IOMMU space, there is plenty of free pages. The function sba_alloc_range is called with size 0x11000, thus the pages_needed variable is 0x11. The function sba_search_bitmap is called with bits_wanted 0x11 and boundary size is 0x10 (because dma_get_seg_boundary(dev) returns 0xffff). The function sba_search_bitmap attempts to allocate 17 pages that must not cross 16-page boundary - it can't satisfy this requirement (iommu_is_span_boundary always returns true) and fails even if there are many free entries in the IOMMU space. How did it happen that we try to allocate 17 pages that don't cross 16-page boundary? The cause is in the function iommu_coalesce_chunks. This function tries to coalesce adjacent entries in the scatterlist. The function does several checks if it may coalesce one entry with the next, one of those checks is this: if (startsg->length + dma_len > max_seg_size) break; When it finishes coalescing adjacent entries, it allocates the mapping: sg_dma_len(contig_sg) = dma_len; dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE); sg_dma_address(contig_sg) = PIDE_FLAG | (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT) | dma_offset; It is possible that (startsg->length + dma_len > max_seg_size) is false (we are just near the 0x10000 max_seg_size boundary), so the funcion decides to coalesce this entry with the next entry. When the coalescing succeeds, the function performs dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE); And now, because of non-zero dma_offset, dma_len is greater than 0x10000. iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts to allocate 17 pages for a device that must not cross 16-page boundary. To fix the bug, we must make sure that dma_len after addition of dma_offset and alignment doesn't cross the segment boundary. I.e. change if (startsg->length + dma_len > max_seg_size) break; to if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size) break; This patch makes this change (it precalculates max_seg_boundary at the beginning of the function iommu_coalesce_chunks). I also added a check that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is not needed for Promise TX2+ SATA, but it may be needed for other devices that have dma_get_seg_boundary lower than dma_get_max_seg_size). Signed-off-by: Mikulas Patocka <[email protected]> Cc: [email protected] Signed-off-by: Helge Deller <[email protected]>

Drivers like vxlan use the recently introduced udp_tunnel_xmit_skb/udp_tunnel6_xmit_skb APIs. udp_tunnel6_xmit_skb makes use of ip6tunnel_xmit, and ip6tunnel_xmit, after sending the packet, updates the struct stats using the usual u64_stats_update_begin/end calls on this_cpu_ptr(dev->tstats). udp_tunnel_xmit_skb makes use of iptunnel_xmit, which doesn't touch tstats, so drivers like vxlan, immediately after, call iptunnel_xmit_stats, which does the same thing - calls u64_stats_update_begin/end on this_cpu_ptr(dev->tstats). While vxlan is probably fine (I don't know?), calling a similar function from, say, an unbound workqueue, on a fully preemptable kernel causes real issues: [ 188.434537] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u8:0/6 [ 188.435579] caller is debug_smp_processor_id+0x17/0x20 [ 188.435583] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.2.6 foss-for-synopsys-dwc-arc-processors#2 [ 188.435607] Call Trace: [ 188.435611] [<ffffffff8234e936>] dump_stack+0x4f/0x7b [ 188.435615] [<ffffffff81915f3d>] check_preemption_disabled+0x19d/0x1c0 [ 188.435619] [<ffffffff81915f77>] debug_smp_processor_id+0x17/0x20 The solution would be to protect the whole this_cpu_ptr(dev->tstats)/u64_stats_update_begin/end blocks with disabling preemption and then reenabling it. Signed-off-by: Jason A. Donenfeld <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>

We try to convert the old way of of specifying fb tiling (obj->tiling) into the new fb modifiers. We store the result in the passed in mode_cmd structure. But that structure comes directly from the addfb2 ioctl, and gets copied back out to userspace, which means we're clobbering the modifiers that the user provided (all 0 since the DRM_MODE_FB_MODIFIERS flag wasn't even set by the user). Hence if the user reuses the struct for another addfb2, the ioctl will be rejected since it's now asking for some modifiers w/o the flag set. Fix the problem by making a copy of the user provided structure. We can play any games we want with the copy. IGT-Version: 1.12-git (x86_64) (Linux: 4.4.0-rc1-stereo+ x86_64) ... Subtest basic-X-tiled: SUCCESS (0.001s) Test assertion failure function pitch_tests, file kms_addfb_basic.c:167: Failed assertion: drmIoctl(fd, DRM_IOCTL_MODE_ADDFB2, &f) == 0 Last errno: 22, Invalid argument Stack trace: #0 [__igt_fail_assert+0x101] foss-for-synopsys-dwc-arc-processors#1 [pitch_tests+0x619] foss-for-synopsys-dwc-arc-processors#2 [__real_main426+0x2f] foss-for-synopsys-dwc-arc-processors#3 [main+0x23] foss-for-synopsys-dwc-arc-processors#4 [__libc_start_main+0xf0] foss-for-synopsys-dwc-arc-processors#5 [_start+0x29] torvalds#6 [<unknown>+0x29] Subtest framebuffer-vs-set-tiling failed. **** DEBUG **** Test assertion failure function pitch_tests, file kms_addfb_basic.c:167: Failed assertion: drmIoctl(fd, DRM_IOCTL_MODE_ADDFB2, &f) == 0 Last errno: 22, Invalid argument **** END **** Subtest framebuffer-vs-set-tiling: FAIL (0.003s) ... IGT-Version: 1.12-git (x86_64) (Linux: 4.4.0-rc1-stereo+ x86_64) Subtest framebuffer-vs-set-tiling: SUCCESS (0.000s) Cc: [email protected] # v4.1+ Cc: Daniel Vetter <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Fixes: 2a80ead ("drm/i915: Add fb format modifier support") Testcase: igt/kms_addfb_basic/clobbered-modifier Signed-off-by: Ville Syrjälä <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Signed-off-by: Jani Nikula <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]

The current code tries to allocate memory with GFP_KERNEL at interrupt context, it would show below warning during the enumeration when I test it with chipidea hardware, change GFP flag as GFP_ATOMIC can fix this issue. [ 40.438237] zero gadget: high-speed config foss-for-synopsys-dwc-arc-processors#2: loopback [ 40.444924] ------------[ cut here ]------------ [ 40.449609] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:2755 lockdep_trace_alloc+0x108/0x128() [ 40.461715] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) [ 40.467130] Modules linked in: [ 40.470216] usb_f_ss_lb g_zero libcomposite evbug [ 40.473822] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.3.0-rc5-00168-gb730aaf torvalds#604 [ 40.481496] Hardware name: Freescale i.MX6 SoloX (Device Tree) [ 40.487345] Backtrace: [ 40.489857] [<80014e94>] (dump_backtrace) from [<80015088>] (show_stack+0x18/0x1c) [ 40.497445] r6:80b67a80 r5:00000000 r4:00000000 r3:00000000 [ 40.503234] [<80015070>] (show_stack) from [<802e27b4>] (dump_stack+0x8c/0xa4) [ 40.510503] [<802e2728>] (dump_stack) from [<8002cfe8>] (warn_slowpath_common+0x80/0xbc) [ 40.518612] r6:8007510c r5:00000009 r4:80b49c88 r3:00000001 [ 40.524396] [<8002cf68>] (warn_slowpath_common) from [<8002d05c>] (warn_slowpath_fmt+0x38/0x40) [ 40.533109] r8:bcfdef80 r7:bdb705cc r6:000080d0 r5:be001e80 r4:809cc278 [ 40.539965] [<8002d028>] (warn_slowpath_fmt) from [<8007510c>] (lockdep_trace_alloc+0x108/0x128) [ 40.548766] r3:809d0128 r2:809cc278 [ 40.552401] r4:600b0193 [ 40.554990] [<80075004>] (lockdep_trace_alloc) from [<801093d4>] (kmem_cache_alloc+0x28/0x15c) [ 40.563618] r4:000080d0 r3:80b4aa8c [ 40.567270] [<801093ac>] (kmem_cache_alloc) from [<804d95e4>] (ep_alloc_request+0x58/0x68) [ 40.575550] r10:7f01f104 r9:00000001 r8:bcfdef80 r7:bdb705cc r6:bc178700 r5:00000000 [ 40.583512] r4:bcfdef80 r3:813c0a38 [ 40.587183] [<804d958c>] (ep_alloc_request) from [<7f01f7ec>] (loopback_set_alt+0x114/0x21c [usb_f_ss_lb]) [ 40.596929] [<7f01f6d8>] (loopback_set_alt [usb_f_ss_lb]) from [<7f006910>] (composite_setup+0xbd0/0x17e8 [libcomposite]) [ 40.607902] r10:bd3a2c0c r9:00000000 r8:bcfdef80 r7:bc178700 r6:bdb702d0 r5:bcfdefdc [ 40.615866] r4:7f0199b4 r3:00000002 [ 40.619542] [<7f005d40>] (composite_setup [libcomposite]) from [<804dae88>] (udc_irq+0x784/0xd1c) [ 40.628431] r10:80bb5619 r9:c0876140 r8:00012001 r7:bdb71010 r6:bdb70568 r5:00010001 [ 40.636392] r4:bdb70014 [ 40.638985] [<804da704>] (udc_irq) from [<804d64f8>] (ci_irq+0x5c/0x118) [ 40.645702] r10:80bb5619 r9:be11e000 r8:00000117 r7:00000000 r6:bdb71010 r5:be11e060 [ 40.653666] r4:bdb70010 [ 40.656261] [<804d649c>] (ci_irq) from [<8007f638>] (handle_irq_event_percpu+0x7c/0x13c) [ 40.664367] r6:00000000 r5:be11e060 r4:bdb05cc0 r3:804d649c [ 40.670149] [<8007f5bc>] (handle_irq_event_percpu) from [<8007f740>] (handle_irq_event+0x48/0x6c) [ 40.679036] r10:00000000 r9:be008000 r8:00000001 r7:00000000 r6:bdb05cc0 r5:be11e060 [ 40.686998] r4:be11e000 [ 40.689581] [<8007f6f8>] (handle_irq_event) from [<80082850>] (handle_fasteoi_irq+0xd4/0x1b0) [ 40.698120] r6:80b56a30 r5:be11e060 r4:be11e000 r3:00000000 [ 40.703898] [<8008277c>] (handle_fasteoi_irq) from [<8007ec04>] (generic_handle_irq+0x28/0x3c) [ 40.712524] r7:00000000 r6:80b4aaf4 r5:00000117 r4:80b445fc [ 40.718304] [<8007ebdc>] (generic_handle_irq) from [<8007ef20>] (__handle_domain_irq+0x6c/0xe8) [ 40.727033] [<8007eeb4>] (__handle_domain_irq) from [<800095d4>] (gic_handle_irq+0x48/0x94) [ 40.735402] r9:c080f100 r8:80b4ac6c r7:c080e100 r6:80b67d40 r5:80b49f00 r4:c080e10c [ 40.743290] [<8000958c>] (gic_handle_irq) from [<80015d38>] (__irq_svc+0x58/0x78) [ 40.750791] Exception stack(0x80b49f00 to 0x80b49f48) [ 40.755873] 9f00: 00000001 00000001 00000000 80024320 80b48000 80b4a9d0 80b4a984 80b433e4 [ 40.764078] 9f20: 00000001 807f4680 00000000 80b49f5c 80b49f20 80b49f50 80071ca4 800113fc [ 40.772272] 9f40: 200b0013 ffffffff [ 40.775776] r9:807f4680 r8:00000001 r7:80b49f34 r6:ffffffff r5:200b0013 r4:800113fc [ 40.783677] [<800113d4>] (arch_cpu_idle) from [<8006c5bc>] (default_idle_call+0x28/0x38) [ 40.791798] [<8006c594>] (default_idle_call) from [<8006c6dc>] (cpu_startup_entry+0x110/0x1b0) [ 40.800445] [<8006c5cc>] (cpu_startup_entry) from [<807e95dc>] (rest_init+0x12c/0x168) [ 40.808376] r7:80b4a8c0 r3:807f4b7c [ 40.812030] [<807e94b0>] (rest_init) from [<80ad7cc0>] (start_kernel+0x360/0x3d4) [ 40.819528] r5:80bcb000 r4:80bcb050 [ 40.823171] [<80ad7960>] (start_kernel) from [<8000807c>] (0x8000807c) It fixes commit 91c42b0 ("usb: gadget: loopback: Fix looping back logic implementation"). Cc: <[email protected]> # v3.18+ Signed-off-by: Peter Chen <[email protected]> Reviewed-by: Krzysztof Opasiak <[email protected]> Signed-off-by: Felipe Balbi <[email protected]>

When a passive TCP is created, we eventually call tcp_md5_do_add() with sk pointing to the child. It is not owner by the user yet (we will add this socket into listener accept queue a bit later anyway) But we do own the spinlock, so amend the lockdep annotation to avoid following splat : [ 8451.090932] net/ipv4/tcp_ipv4.c:923 suspicious rcu_dereference_protected() usage! [ 8451.090932] [ 8451.090932] other info that might help us debug this: [ 8451.090932] [ 8451.090934] [ 8451.090934] rcu_scheduler_active = 1, debug_locks = 1 [ 8451.090936] 3 locks held by socket_sockopt_/214795: [ 8451.090936] #0: (rcu_read_lock){.+.+..}, at: [<ffffffff855c6ac1>] __netif_receive_skb_core+0x151/0xe90 [ 8451.090947] foss-for-synopsys-dwc-arc-processors#1: (rcu_read_lock){.+.+..}, at: [<ffffffff85618143>] ip_local_deliver_finish+0x43/0x2b0 [ 8451.090952] foss-for-synopsys-dwc-arc-processors#2: (slock-AF_INET){+.-...}, at: [<ffffffff855acda5>] sk_clone_lock+0x1c5/0x500 [ 8451.090958] [ 8451.090958] stack backtrace: [ 8451.090960] CPU: 7 PID: 214795 Comm: socket_sockopt_ [ 8451.091215] Call Trace: [ 8451.091216] <IRQ> [<ffffffff856fb29c>] dump_stack+0x55/0x76 [ 8451.091229] [<ffffffff85123b5b>] lockdep_rcu_suspicious+0xeb/0x110 [ 8451.091235] [<ffffffff8564544f>] tcp_md5_do_add+0x1bf/0x1e0 [ 8451.091239] [<ffffffff85645751>] tcp_v4_syn_recv_sock+0x1f1/0x4c0 [ 8451.091242] [<ffffffff85642b27>] ? tcp_v4_md5_hash_skb+0x167/0x190 [ 8451.091246] [<ffffffff85647c78>] tcp_check_req+0x3c8/0x500 [ 8451.091249] [<ffffffff856451ae>] ? tcp_v4_inbound_md5_hash+0x11e/0x190 [ 8451.091253] [<ffffffff85647170>] tcp_v4_rcv+0x3c0/0x9f0 [ 8451.091256] [<ffffffff85618143>] ? ip_local_deliver_finish+0x43/0x2b0 [ 8451.091260] [<ffffffff856181b6>] ip_local_deliver_finish+0xb6/0x2b0 [ 8451.091263] [<ffffffff85618143>] ? ip_local_deliver_finish+0x43/0x2b0 [ 8451.091267] [<ffffffff85618d38>] ip_local_deliver+0x48/0x80 [ 8451.091270] [<ffffffff85618510>] ip_rcv_finish+0x160/0x700 [ 8451.091273] [<ffffffff8561900e>] ip_rcv+0x29e/0x3d0 [ 8451.091277] [<ffffffff855c74b7>] __netif_receive_skb_core+0xb47/0xe90 Fixes: a8afca0 ("tcp: md5: protects md5sig_info with RCU") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>

Liu reported that running certain parts of xfstests threw the following error: BUG: sleeping function called from invalid context at mm/page_alloc.c:3190 in_atomic(): 1, irqs_disabled(): 0, pid: 6, name: kworker/u16:0 3 locks held by kworker/u16:0/6: #0: ("writeback"){++++.+}, at: [<ffffffff8107f083>] process_one_work+0x173/0x730 foss-for-synopsys-dwc-arc-processors#1: ((&(&wb->dwork)->work)){+.+.+.}, at: [<ffffffff8107f083>] process_one_work+0x173/0x730 foss-for-synopsys-dwc-arc-processors#2: (&type->s_umount_key#44){+++++.}, at: [<ffffffff811e6805>] trylock_super+0x25/0x60 CPU: 5 PID: 6 Comm: kworker/u16:0 Tainted: G OE 4.3.0+ foss-for-synopsys-dwc-arc-processors#3 Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011 Workqueue: writeback wb_workfn (flush-btrfs-108) ffffffff81a3abab ffff88042e282ba8 ffffffff8130191b ffffffff81a3abab 0000000000000c76 ffff88042e282ba8 ffff88042e27c180 ffff88042e282bd8 ffffffff8108ed95 ffff880400000004 0000000000000000 0000000000000c76 Call Trace: [<ffffffff8130191b>] dump_stack+0x4f/0x74 [<ffffffff8108ed95>] ___might_sleep+0x185/0x240 [<ffffffff8108eea2>] __might_sleep+0x52/0x90 [<ffffffff811817e8>] __alloc_pages_nodemask+0x268/0x410 [<ffffffff8109a43c>] ? sched_clock_local+0x1c/0x90 [<ffffffff8109a6d1>] ? local_clock+0x21/0x40 [<ffffffff810b9eb0>] ? __lock_release+0x420/0x510 [<ffffffff810b534c>] ? __lock_acquired+0x16c/0x3c0 [<ffffffff811ca265>] alloc_pages_current+0xc5/0x210 [<ffffffffa0577105>] ? rbio_is_full+0x55/0x70 [btrfs] [<ffffffff810b7ed8>] ? mark_held_locks+0x78/0xa0 [<ffffffff81666d50>] ? _raw_spin_unlock_irqrestore+0x40/0x60 [<ffffffffa0578c0a>] full_stripe_write+0x5a/0xc0 [btrfs] [<ffffffffa0578ca9>] __raid56_parity_write+0x39/0x60 [btrfs] [<ffffffffa0578deb>] run_plug+0x11b/0x140 [btrfs] [<ffffffffa0578e33>] btrfs_raid_unplug+0x23/0x70 [btrfs] [<ffffffff812d36c2>] blk_flush_plug_list+0x82/0x1f0 [<ffffffff812e0349>] blk_sq_make_request+0x1f9/0x740 [<ffffffff812ceba2>] ? generic_make_request_checks+0x222/0x7c0 [<ffffffff812cf264>] ? blk_queue_enter+0x124/0x310 [<ffffffff812cf1d2>] ? blk_queue_enter+0x92/0x310 [<ffffffff812d0ae2>] generic_make_request+0x172/0x2c0 [<ffffffff812d0ad4>] ? generic_make_request+0x164/0x2c0 [<ffffffff812d0ca0>] submit_bio+0x70/0x140 [<ffffffffa0577b29>] ? rbio_add_io_page+0x99/0x150 [btrfs] [<ffffffffa0578a89>] finish_rmw+0x4d9/0x600 [btrfs] [<ffffffffa0578c4c>] full_stripe_write+0x9c/0xc0 [btrfs] [<ffffffffa057ab7f>] raid56_parity_write+0xef/0x160 [btrfs] [<ffffffffa052bd83>] btrfs_map_bio+0xe3/0x2d0 [btrfs] [<ffffffffa04fbd6d>] btrfs_submit_bio_hook+0x8d/0x1d0 [btrfs] [<ffffffffa05173c4>] submit_one_bio+0x74/0xb0 [btrfs] [<ffffffffa0517f55>] submit_extent_page+0xe5/0x1c0 [btrfs] [<ffffffffa0519b18>] __extent_writepage_io+0x408/0x4c0 [btrfs] [<ffffffffa05179c0>] ? alloc_dummy_extent_buffer+0x140/0x140 [btrfs] [<ffffffffa051dc88>] __extent_writepage+0x218/0x3a0 [btrfs] [<ffffffff810b7ed8>] ? mark_held_locks+0x78/0xa0 [<ffffffffa051e2c9>] extent_write_cache_pages.clone.0+0x2f9/0x400 [btrfs] [<ffffffffa051e422>] extent_writepages+0x52/0x70 [btrfs] [<ffffffffa05001f0>] ? btrfs_set_inode_index+0x70/0x70 [btrfs] [<ffffffffa04fcc17>] btrfs_writepages+0x27/0x30 [btrfs] [<ffffffff81184df3>] do_writepages+0x23/0x40 [<ffffffff81212229>] __writeback_single_inode+0x89/0x4d0 [<ffffffff81212a60>] ? writeback_sb_inodes+0x260/0x480 [<ffffffff81212a60>] ? writeback_sb_inodes+0x260/0x480 [<ffffffff8121295f>] ? writeback_sb_inodes+0x15f/0x480 [<ffffffff81212ad2>] writeback_sb_inodes+0x2d2/0x480 [<ffffffff810b1397>] ? down_read_trylock+0x57/0x60 [<ffffffff811e6805>] ? trylock_super+0x25/0x60 [<ffffffff810d629f>] ? rcu_read_lock_sched_held+0x4f/0x90 [<ffffffff81212d0c>] __writeback_inodes_wb+0x8c/0xc0 [<ffffffff812130b5>] wb_writeback+0x2b5/0x500 [<ffffffff810b7ed8>] ? mark_held_locks+0x78/0xa0 [<ffffffff810660a8>] ? __local_bh_enable_ip+0x68/0xc0 [<ffffffff81213362>] ? wb_do_writeback+0x62/0x310 [<ffffffff812133c1>] wb_do_writeback+0xc1/0x310 [<ffffffff8107c3d9>] ? set_worker_desc+0x79/0x90 [<ffffffff81213842>] wb_workfn+0x92/0x330 [<ffffffff8107f133>] process_one_work+0x223/0x730 [<ffffffff8107f083>] ? process_one_work+0x173/0x730 [<ffffffff8108035f>] ? worker_thread+0x18f/0x430 [<ffffffff810802ed>] worker_thread+0x11d/0x430 [<ffffffff810801d0>] ? maybe_create_worker+0xf0/0xf0 [<ffffffff810801d0>] ? maybe_create_worker+0xf0/0xf0 [<ffffffff810858df>] kthread+0xef/0x110 [<ffffffff8108f74e>] ? schedule_tail+0x1e/0xd0 [<ffffffff810857f0>] ? __init_kthread_worker+0x70/0x70 [<ffffffff816673bf>] ret_from_fork+0x3f/0x70 [<ffffffff810857f0>] ? __init_kthread_worker+0x70/0x70 The issue is that we've got the software context pinned while calling blk_flush_plug_list(), which flushes callbacks that are allowed to sleep. btrfs and raid has such callbacks. Flip the checks around a bit, so we can enable preempt a bit earlier and flush plugs without having preempt disabled. This only affects blk-mq driven devices, and only those that register a single queue. Reported-by: Liu Bo <[email protected]> Tested-by: Liu Bo <[email protected]> Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>

We've got a mess on our hands. 1. xfs_trans_commit() cannot cancel transactions because the mount is shut down - that causes dirty, aborted, unlogged log items to sit unpinned in memory and potentially get written to disk before the log is shut down. Hence xfs_trans_commit() can only abort transactions when xlog_is_shutdown() is true. 2. xfs_force_shutdown() is used in places to cause the current modification to be aborted via xfs_trans_commit() because it may be impractical or impossible to cancel the transaction directly, and hence xfs_trans_commit() must cancel transactions when xfs_is_shutdown() is true in this situation. But we can't do that because of #1. 3. Log IO errors cause log shutdowns by calling xfs_force_shutdown() to shut down the mount and then the log from log IO completion. 4. xfs_force_shutdown() can result in a log force being issued, which has to wait for log IO completion before it will mark the log as shut down. If #3 races with some other shutdown trigger that runs a log force, we rely on xfs_force_shutdown() silently ignoring #3 and avoiding shutting down the log until the failed log force completes. 5. To ensure #2 always works, we have to ensure that xfs_force_shutdown() does not return until the the log is shut down. But in the case of #4, this will result in a deadlock because the log Io completion will block waiting for a log force to complete which is blocked waiting for log IO to complete.... So the very first thing we have to do here to untangle this mess is dissociate log shutdown triggers from mount shutdowns. We already have xlog_forced_shutdown, which will atomically transistion to the log a shutdown state. Due to internal asserts it cannot be called multiple times, but was done simply because the only place that could call it was xfs_do_force_shutdown() (i.e. the mount shutdown!) and that could only call it once and once only. So the first thing we do is remove the asserts. We then convert all the internal log shutdown triggers to call xlog_force_shutdown() directly instead of xfs_force_shutdown(). This allows the log shutdown triggers to shut down the log without needing to care about mount based shutdown constraints. This means we shut down the log independently of the mount and the mount may not notice this until it's next attempt to read or modify metadata. At that point (e.g. xfs_trans_commit()) it will see that the log is shutdown, error out and shutdown the mount. To ensure that all the unmount behaviours and asserts track correctly as a result of a log shutdown, propagate the shutdown up to the mount if it is not already set. This keeps the mount and log state in sync, and saves a huge amount of hassle where code fails because of a log shutdown but only checks for mount shutdowns and hence ends up doing the wrong thing. Cleaning up that mess is an exercise for another day. This enables us to address the other problems noted above in followup patches. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>

When calling smb2_ioctl_query_info() with invalid smb_query_info::flags, a NULL ptr dereference is triggered when trying to kfree() uninitialised rqst[n].rq_iov array. This also fixes leaked paths that are created in SMB2_open_init() which required SMB2_open_free() to properly free them. Here is a small C reproducer that triggers it #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <unistd.h> #include <fcntl.h> #include <sys/ioctl.h> #define die(s) perror(s), exit(1) #define QUERY_INFO 0xc018cf07 int main(int argc, char *argv[]) { int fd; if (argc < 2) exit(1); fd = open(argv[1], O_RDONLY); if (fd == -1) die("open"); if (ioctl(fd, QUERY_INFO, (uint32_t[]) { 0, 0, 0, 4, 0, 0}) == -1) die("ioctl"); close(fd); return 0; } mount.cifs //srv/share /mnt -o ... gcc repro.c && ./a.out /mnt/f0 [ 1832.124468] CIFS: VFS: \\w22-dc.zelda.test\test Invalid passthru query flags: 0x4 [ 1832.125043] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 1832.125764] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 1832.126241] CPU: 3 PID: 1133 Comm: a.out Not tainted 5.17.0-rc8 #2 [ 1832.126630] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 [ 1832.127322] RIP: 0010:smb2_ioctl_query_info+0x7a3/0xe30 [cifs] [ 1832.127749] Code: 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 6c 05 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 74 24 28 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 cb 04 00 00 49 8b 3e e8 bb fc fa ff 48 89 da 48 [ 1832.128911] RSP: 0018:ffffc90000957b08 EFLAGS: 00010256 [ 1832.129243] RAX: dffffc0000000000 RBX: ffff888117e9b850 RCX: ffffffffa020580d [ 1832.129691] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffa043a2c0 [ 1832.130137] RBP: ffff888117e9b878 R08: 0000000000000001 R09: 0000000000000003 [ 1832.130585] R10: fffffbfff4087458 R11: 0000000000000001 R12: ffff888117e9b800 [ 1832.131037] R13: 00000000ffffffea R14: 0000000000000000 R15: ffff888117e9b8a8 [ 1832.131485] FS: 00007fcee9900740(0000) GS:ffff888151a00000(0000) knlGS:0000000000000000 [ 1832.131993] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1832.132354] CR2: 00007fcee9a1ef5e CR3: 0000000114cd2000 CR4: 0000000000350ee0 [ 1832.132801] Call Trace: [ 1832.132962] <TASK> [ 1832.133104] ? smb2_query_reparse_tag+0x890/0x890 [cifs] [ 1832.133489] ? cifs_mapchar+0x460/0x460 [cifs] [ 1832.133822] ? rcu_read_lock_sched_held+0x3f/0x70 [ 1832.134125] ? cifs_strndup_to_utf16+0x15b/0x250 [cifs] [ 1832.134502] ? lock_downgrade+0x6f0/0x6f0 [ 1832.134760] ? cifs_convert_path_to_utf16+0x198/0x220 [cifs] [ 1832.135170] ? smb2_check_message+0x1080/0x1080 [cifs] [ 1832.135545] cifs_ioctl+0x1577/0x3320 [cifs] [ 1832.135864] ? lock_downgrade+0x6f0/0x6f0 [ 1832.136125] ? cifs_readdir+0x2e60/0x2e60 [cifs] [ 1832.136468] ? rcu_read_lock_sched_held+0x3f/0x70 [ 1832.136769] ? __rseq_handle_notify_resume+0x80b/0xbe0 [ 1832.137096] ? __up_read+0x192/0x710 [ 1832.137327] ? __ia32_sys_rseq+0xf0/0xf0 [ 1832.137578] ? __x64_sys_openat+0x11f/0x1d0 [ 1832.137850] __x64_sys_ioctl+0x127/0x190 [ 1832.138103] do_syscall_64+0x3b/0x90 [ 1832.138378] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 1832.138702] RIP: 0033:0x7fcee9a253df [ 1832.138937] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ 1832.140107] RSP: 002b:00007ffeba94a8a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 1832.140606] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcee9a253df [ 1832.141058] RDX: 00007ffeba94a910 RSI: 00000000c018cf07 RDI: 0000000000000003 [ 1832.141503] RBP: 00007ffeba94a930 R08: 00007fcee9b24db0 R09: 00007fcee9b45c4e [ 1832.141948] R10: 00007fcee9918d40 R11: 0000000000000246 R12: 00007ffeba94aa48 [ 1832.142396] R13: 0000000000401176 R14: 0000000000403df8 R15: 00007fcee9b78000 [ 1832.142851] </TASK> [ 1832.142994] Modules linked in: cifs cifs_arc4 cifs_md4 bpf_preload [last unloaded: cifs] Cc: [email protected] Signed-off-by: Paulo Alcantara (SUSE) <[email protected]> Signed-off-by: Steve French <[email protected]>

As guest_irq is coming from KVM_IRQFD API call, it may trigger crash in svm_update_pi_irte() due to out-of-bounds: crash> bt PID: 22218 TASK: ffff951a6ad74980 CPU: 73 COMMAND: "vcpu8" #0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397 #1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d #2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d #3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d #4 [ffffb1ba6707fb90] no_context at ffffffff856692c9 #5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51 torvalds#6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace [exception RIP: svm_update_pi_irte+227] RIP: ffffffffc0761b53 RSP: ffffb1ba6707fd08 RFLAGS: 00010086 RAX: ffffb1ba6707fd78 RBX: ffffb1ba66d91000 RCX: 0000000000000001 RDX: 00003c803f63f1c0 RSI: 000000000000019a RDI: ffffb1ba66db2ab8 RBP: 000000000000019a R8: 0000000000000040 R9: ffff94ca41b82200 R10: ffffffffffffffcf R11: 0000000000000001 R12: 0000000000000001 R13: 0000000000000001 R14: ffffffffffffffcf R15: 000000000000005f ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm] #8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm] #9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm] RIP: 00007f143c36488b RSP: 00007f143a4e04b8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007f05780041d0 RCX: 00007f143c36488b RDX: 00007f05780041d0 RSI: 000000004008ae6a RDI: 0000000000000020 RBP: 00000000000004e8 R8: 0000000000000008 R9: 00007f05780041e0 R10: 00007f0578004560 R11: 0000000000000246 R12: 00000000000004e0 R13: 000000000000001a R14: 00007f1424001c60 R15: 00007f0578003bc0 ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b Vmx have been fix this in commit 3a8b067 (KVM: VMX: Do not BUG() on out-of-bounds guest IRQ), so we can just copy source from that to fix this. Co-developed-by: Yi Liu <[email protected]> Signed-off-by: Yi Liu <[email protected]> Signed-off-by: Yi Wang <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>

There is possible circular locking dependency detected on event_mutex (see below logs). This is due to set fail safe mode is done at dp_panel_read_sink_caps() within event_mutex scope. To break this possible circular locking, this patch move setting fail safe mode out of event_mutex scope. [ 23.958078] ====================================================== [ 23.964430] WARNING: possible circular locking dependency detected [ 23.970777] 5.17.0-rc2-lockdep-00088-g05241de1f69e #148 Not tainted [ 23.977219] ------------------------------------------------------ [ 23.983570] DrmThread/1574 is trying to acquire lock: [ 23.988763] ffffff808423aab0 (&dp->event_mutex){+.+.}-{3:3}, at: msm_dp_displ ay_enable+0x58/0x164 [ 23.997895] [ 23.997895] but task is already holding lock: [ 24.003895] ffffff808420b280 (&kms->commit_lock[i]/1){+.+.}-{3:3}, at: lock_c rtcs+0x80/0x8c [ 24.012495] [ 24.012495] which lock already depends on the new lock. [ 24.012495] [ 24.020886] [ 24.020886] the existing dependency chain (in reverse order) is: [ 24.028570] [ 24.028570] -> #5 (&kms->commit_lock[i]/1){+.+.}-{3:3}: [ 24.035472] __mutex_lock+0xc8/0x384 [ 24.039695] mutex_lock_nested+0x54/0x74 [ 24.044272] lock_crtcs+0x80/0x8c [ 24.048222] msm_atomic_commit_tail+0x1e8/0x3d0 [ 24.053413] commit_tail+0x7c/0xfc [ 24.057452] drm_atomic_helper_commit+0x158/0x15c [ 24.062826] drm_atomic_commit+0x60/0x74 [ 24.067403] drm_mode_atomic_ioctl+0x6b0/0x908 [ 24.072508] drm_ioctl_kernel+0xe8/0x168 [ 24.077086] drm_ioctl+0x320/0x370 [ 24.081123] drm_compat_ioctl+0x40/0xdc [ 24.085602] __arm64_compat_sys_ioctl+0xe0/0x150 [ 24.090895] invoke_syscall+0x80/0x114 [ 24.095294] el0_svc_common.constprop.3+0xc4/0xf8 [ 24.100668] do_el0_svc_compat+0x2c/0x54 [ 24.105242] el0_svc_compat+0x4c/0xe4 [ 24.109548] el0t_32_sync_handler+0xc4/0xf4 [ 24.114381] el0t_32_sync+0x178 [ 24.118688] [ 24.118688] -> #4 (&kms->commit_lock[i]){+.+.}-{3:3}: [ 24.125408] __mutex_lock+0xc8/0x384 [ 24.129628] mutex_lock_nested+0x54/0x74 [ 24.134204] lock_crtcs+0x80/0x8c [ 24.138155] msm_atomic_commit_tail+0x1e8/0x3d0 [ 24.143345] commit_tail+0x7c/0xfc [ 24.147382] drm_atomic_helper_commit+0x158/0x15c [ 24.152755] drm_atomic_commit+0x60/0x74 [ 24.157323] drm_atomic_helper_set_config+0x68/0x90 [ 24.162869] drm_mode_setcrtc+0x394/0x648 [ 24.167535] drm_ioctl_kernel+0xe8/0x168 [ 24.172102] drm_ioctl+0x320/0x370 [ 24.176135] drm_compat_ioctl+0x40/0xdc [ 24.180621] __arm64_compat_sys_ioctl+0xe0/0x150 [ 24.185904] invoke_syscall+0x80/0x114 [ 24.190302] el0_svc_common.constprop.3+0xc4/0xf8 [ 24.195673] do_el0_svc_compat+0x2c/0x54 [ 24.200241] el0_svc_compat+0x4c/0xe4 [ 24.204544] el0t_32_sync_handler+0xc4/0xf4 [ 24.209378] el0t_32_sync+0x174/0x178 [ 24.213680] -> #3 (crtc_ww_class_mutex){+.+.}-{3:3}: [ 24.220308] __ww_mutex_lock.constprop.20+0xe8/0x878 [ 24.225951] ww_mutex_lock+0x60/0xd0 [ 24.230166] modeset_lock+0x190/0x19c [ 24.234467] drm_modeset_lock+0x34/0x54 [ 24.238953] drmm_mode_config_init+0x550/0x764 [ 24.244065] msm_drm_bind+0x170/0x59c [ 24.248374] try_to_bring_up_master+0x244/0x294 [ 24.253572] __component_add+0xf4/0x14c [ 24.258057] component_add+0x2c/0x38 [ 24.262273] dsi_dev_attach+0x2c/0x38 [ 24.266575] dsi_host_attach+0xc4/0x120 [ 24.271060] mipi_dsi_attach+0x34/0x48 [ 24.275456] devm_mipi_dsi_attach+0x28/0x68 [ 24.280298] ti_sn_bridge_probe+0x2b4/0x2dc [ 24.285137] auxiliary_bus_probe+0x78/0x90 [ 24.289893] really_probe+0x1e4/0x3d8 [ 24.294194] __driver_probe_device+0x14c/0x164 [ 24.299298] driver_probe_device+0x54/0xf8 [ 24.304043] __device_attach_driver+0xb4/0x118 [ 24.309145] bus_for_each_drv+0xb0/0xd4 [ 24.313628] __device_attach+0xcc/0x158 [ 24.318112] device_initial_probe+0x24/0x30 [ 24.322954] bus_probe_device+0x38/0x9c [ 24.327439] deferred_probe_work_func+0xd4/0xf0 [ 24.332628] process_one_work+0x2f0/0x498 [ 24.337289] process_scheduled_works+0x44/0x48 [ 24.342391] worker_thread+0x1e4/0x26c [ 24.346788] kthread+0xe4/0xf4 [ 24.350470] ret_from_fork+0x10/0x20 [ 24.354683] [ 24.354683] [ 24.354683] -> #2 (crtc_ww_class_acquire){+.+.}-{0:0}: [ 24.361489] drm_modeset_acquire_init+0xe4/0x138 [ 24.366777] drm_helper_probe_detect_ctx+0x44/0x114 [ 24.372327] check_connector_changed+0xbc/0x198 [ 24.377517] drm_helper_hpd_irq_event+0xcc/0x11c [ 24.382804] dsi_hpd_worker+0x24/0x30 [ 24.387104] process_one_work+0x2f0/0x498 [ 24.391762] worker_thread+0x1d0/0x26c [ 24.396158] kthread+0xe4/0xf4 [ 24.399840] ret_from_fork+0x10/0x20 [ 24.404053] [ 24.404053] -> #1 (&dev->mode_config.mutex){+.+.}-{3:3}: [ 24.411032] __mutex_lock+0xc8/0x384 [ 24.415247] mutex_lock_nested+0x54/0x74 [ 24.419819] dp_panel_read_sink_caps+0x23c/0x26c [ 24.425108] dp_display_process_hpd_high+0x34/0xd4 [ 24.430570] dp_display_usbpd_configure_cb+0x30/0x3c [ 24.436205] hpd_event_thread+0x2ac/0x550 [ 24.440864] kthread+0xe4/0xf4 [ 24.444544] ret_from_fork+0x10/0x20 [ 24.448757] [ 24.448757] -> #0 (&dp->event_mutex){+.+.}-{3:3}: [ 24.455116] __lock_acquire+0xe2c/0x10d8 [ 24.459690] lock_acquire+0x1ac/0x2d0 [ 24.463988] __mutex_lock+0xc8/0x384 [ 24.468201] mutex_lock_nested+0x54/0x74 [ 24.472773] msm_dp_display_enable+0x58/0x164 [ 24.477789] dp_bridge_enable+0x24/0x30 [ 24.482273] drm_atomic_bridge_chain_enable+0x78/0x9c [ 24.488006] drm_atomic_helper_commit_modeset_enables+0x1bc/0x244 [ 24.494801] msm_atomic_commit_tail+0x248/0x3d0 [ 24.499992] commit_tail+0x7c/0xfc [ 24.504031] drm_atomic_helper_commit+0x158/0x15c [ 24.509404] drm_atomic_commit+0x60/0x74 [ 24.513976] drm_mode_atomic_ioctl+0x6b0/0x908 [ 24.519079] drm_ioctl_kernel+0xe8/0x168 [ 24.523650] drm_ioctl+0x320/0x370 [ 24.527689] drm_compat_ioctl+0x40/0xdc [ 24.532175] __arm64_compat_sys_ioctl+0xe0/0x150 [ 24.537463] invoke_syscall+0x80/0x114 [ 24.541861] el0_svc_common.constprop.3+0xc4/0xf8 [ 24.547235] do_el0_svc_compat+0x2c/0x54 [ 24.551806] el0_svc_compat+0x4c/0xe4 [ 24.556106] el0t_32_sync_handler+0xc4/0xf4 [ 24.560948] el0t_32_sync+0x174/0x178 Changes in v2: -- add circular lockiing trace Fixes: d4aca42 ("drm/msm/dp: always add fail-safe mode into connector mode list") Signed-off-by: Kuogee Hsieh <[email protected]> Reviewed-by: Dmitry Baryshkov <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/481396/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dmitry Baryshkov <[email protected]> Signed-off-by: Rob Clark <[email protected]>

The root cause is the race as follows: Thread #1 Thread #2(irq ctx) z_erofs_runqueue() struct z_erofs_decompressqueue io_A[]; submit bio A z_erofs_decompress_kickoff(,,1) z_erofs_decompressqueue_endio(bio A) z_erofs_decompress_kickoff(,,-1) spin_lock_irqsave() atomic_add_return() io_wait_event() -> pending_bios is already 0 [end of function] wake_up_locked(io_A[]) // crash Referenced backtrace in kernel 5.4: [ 10.129422] Unable to handle kernel paging request at virtual address eb0454a4 [ 10.364157] CPU: 0 PID: 709 Comm: getprop Tainted: G WC O 5.4.147-ab09225 #1 [ 11.556325] [<c01b33b8>] (__wake_up_common) from [<c01b3300>] (__wake_up_locked+0x40/0x48) [ 11.565487] [<c01b3300>] (__wake_up_locked) from [<c044c8d0>] (z_erofs_vle_unzip_kickoff+0x6c/0xc0) [ 11.575438] [<c044c8d0>] (z_erofs_vle_unzip_kickoff) from [<c044c854>] (z_erofs_vle_read_endio+0x16c/0x17c) [ 11.586082] [<c044c854>] (z_erofs_vle_read_endio) from [<c06a80e8>] (clone_endio+0xb4/0x1d0) [ 11.595428] [<c06a80e8>] (clone_endio) from [<c04a1280>] (blk_update_request+0x150/0x4dc) [ 11.604516] [<c04a1280>] (blk_update_request) from [<c06dea28>] (mmc_blk_cqe_complete_rq+0x144/0x15c) [ 11.614640] [<c06dea28>] (mmc_blk_cqe_complete_rq) from [<c04a5d90>] (blk_done_softirq+0xb0/0xcc) [ 11.624419] [<c04a5d90>] (blk_done_softirq) from [<c010242c>] (__do_softirq+0x184/0x56c) [ 11.633419] [<c010242c>] (__do_softirq) from [<c01051e8>] (irq_exit+0xd4/0x138) [ 11.641640] [<c01051e8>] (irq_exit) from [<c010c314>] (__handle_domain_irq+0x94/0xd0) [ 11.650381] [<c010c314>] (__handle_domain_irq) from [<c04fde70>] (gic_handle_irq+0x50/0xd4) [ 11.659641] [<c04fde70>] (gic_handle_irq) from [<c0101b70>] (__irq_svc+0x70/0xb0) Signed-off-by: Hongyu Jin <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>

into HEAD KVM/riscv fixes for 5.18, take #2 - Remove 's' & 'u' as valid ISA extension - Do not allow disabling the base extensions 'i'/'m'/'a'/'c'

Make a KVM_REQ_APICV_UPDATE request when creating a vCPU with an in-kernel local APIC and APICv enabled at the module level. Consuming kvm_apicv_activated() and stuffing vcpu->arch.apicv_active directly can race with __kvm_set_or_clear_apicv_inhibit(), as vCPU creation happens before the vCPU is fully onlined, i.e. it won't get the request made to "all" vCPUs. If APICv is globally inhibited between setting apicv_active and onlining the vCPU, the vCPU will end up running with APICv enabled and trigger KVM's sanity check. Mark APICv as active during vCPU creation if APICv is enabled at the module level, both to be optimistic about it's final state, e.g. to avoid additional VMWRITEs on VMX, and because there are likely bugs lurking since KVM checks apicv_active in multiple vCPU creation paths. While keeping the current behavior of consuming kvm_apicv_activated() is arguably safer from a regression perspective, force apicv_active so that vCPU creation runs with deterministic state and so that if there are bugs, they are found sooner than later, i.e. not when some crazy race condition is hit. WARNING: CPU: 0 PID: 484 at arch/x86/kvm/x86.c:9877 vcpu_enter_guest+0x2ae3/0x3ee0 arch/x86/kvm/x86.c:9877 Modules linked in: CPU: 0 PID: 484 Comm: syz-executor361 Not tainted 5.16.13 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1~cloud0 04/01/2014 RIP: 0010:vcpu_enter_guest+0x2ae3/0x3ee0 arch/x86/kvm/x86.c:9877 Call Trace: <TASK> vcpu_run arch/x86/kvm/x86.c:10039 [inline] kvm_arch_vcpu_ioctl_run+0x337/0x15e0 arch/x86/kvm/x86.c:10234 kvm_vcpu_ioctl+0x4d2/0xc80 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3727 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline] __se_sys_ioctl fs/ioctl.c:860 [inline] __x64_sys_ioctl+0x16d/0x1d0 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae The bug was hit by a syzkaller spamming VM creation with 2 vCPUs and a call to KVM_SET_GUEST_DEBUG. r0 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000000), 0x0, 0x0) r1 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0) ioctl$KVM_CAP_SPLIT_IRQCHIP(r1, 0x4068aea3, &(0x7f0000000000)) (async) r2 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x0) (async) r3 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x400000000000002) ioctl$KVM_SET_GUEST_DEBUG(r3, 0x4048ae9b, &(0x7f00000000c0)={0x5dda9c14aa95f5c5}) ioctl$KVM_RUN(r2, 0xae80, 0x0) Reported-by: Gaoning Pan <[email protected]> Reported-by: Yongkang Jia <[email protected]> Fixes: 8df14af ("kvm: x86: Add support for dynamic APICv activation") Cc: [email protected] Cc: Maxim Levitsky <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

With enabled CONFIG_DEBUG_PREEMPT the following stack-trace gets printed on SMP ARC platforms with performance counters interrupts: ---------------------------------------8------------------------------------- ARC perf : 8 counters (48 bits), 128 conditions, [overflow IRQ support] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 caller is arc_pmu_device_probe+0x336/0x3c8 CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.17.0 #2 Stack Trace: arc_unwind_core+0xe8/0x118 dump_stack_lvl+0x48/0x68 check_preemption_disabled+0xb4/0xb8 arc_pmu_device_probe+0x336/0x3c8 platform_probe+0x30/0x80 really_probe.part.0+0x88/0x240 driver_probe_device+0x86/0x1ec __driver_attach+0x8a/0x144 bus_for_each_dev+0x38/0x64 bus_add_driver+0x116/0x17c driver_register+0x4c/0xdc do_one_initcall+0x30/0x16c kernel_init_freeable+0x1be/0x224 workingset: timestamp_bits=14 max_order=15 bucket_order=1 io scheduler mq-deadline registered --------------------------------------8------------------------------------- That happens because of use of this_cpu_ptr() in https://elixir.bootlin.com/linux/v5.17/source/arch/arc/kernel/perf_event.c#L811 without explicitly disabled preemption, see https://www.kernel.org/doc/html/latest/core-api/this_cpu_ops.html#special-operations. In this commit we explicitly disable preemption around that code.

THe high level structure of most ARC exception handlers is 1. save regfile with EXCEPTION_PROLOGUE 2. setup r0: EFA (not part of pt_regs) 3. setup r1: pointer to pt_regs (SP) 4. drop down to pure kernel mode (from exception) 5. call the Linux "C" handler Remove the boiler plate code by moving #2, #3, #4 into #1. The exceptions to most exceptions are syscall Trap and Machine check which don't do some of above for various reasons, so call a newly introduced variant EXCEPTION_PROLOGUE_KEEP_AE (same as original EXCEPTION_PROLOGUE) Signed-off-by: Vineet Gupta <[email protected]>

Signed-off-by: Vineet Gupta <[email protected]>

… 64-bit [Fix link errors with -mcmodel=large and LINK_BASE=0xFFFF_0000_0000 #2] Current asm code handles ex_tables entries (PC markers) as 32-bits values, while they are actually 64-bits wide. | LD .tmp_vmlinux1 | arch/arc/kernel/unwind.o:(__ex_table+0x0): relocation truncated to fit: R_ARC_32 against `no symbol' | make[1]: *** [/home/vineetg/arc/v2-kernel/Makefile:1078: vmlinux] Error 1 Signed-off-by: Vineet Gupta <[email protected]>

With enabled CONFIG_DEBUG_PREEMPT the following stack-trace gets printed on SMP ARC platforms with performance counters interrupts: ---------------------------------------8------------------------------------- ARC perf : 8 counters (48 bits), 128 conditions, [overflow IRQ support] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 caller is arc_pmu_device_probe+0x336/0x3c8 CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.17.0 #2 Stack Trace: arc_unwind_core+0xe8/0x118 dump_stack_lvl+0x48/0x68 check_preemption_disabled+0xb4/0xb8 arc_pmu_device_probe+0x336/0x3c8 platform_probe+0x30/0x80 really_probe.part.0+0x88/0x240 driver_probe_device+0x86/0x1ec __driver_attach+0x8a/0x144 bus_for_each_dev+0x38/0x64 bus_add_driver+0x116/0x17c driver_register+0x4c/0xdc do_one_initcall+0x30/0x16c kernel_init_freeable+0x1be/0x224 workingset: timestamp_bits=14 max_order=15 bucket_order=1 io scheduler mq-deadline registered --------------------------------------8------------------------------------- That happens because of use of this_cpu_ptr() in https://elixir.bootlin.com/linux/v5.17/source/arch/arc/kernel/perf_event.c#L811 without explicitly disabled preemption, see https://www.kernel.org/doc/html/latest/core-api/this_cpu_ops.html#special-operations. In this commit we explicitly disable preemption around that code.

Since major opcodes for ARCv2/ARCv2/ARCv3 are slightly different between ISAs, then we need to recognize instructions sizes differently depending on the architecture.

THe high level structure of most ARC exception handlers is 1. save regfile with EXCEPTION_PROLOGUE 2. setup r0: EFA (not part of pt_regs) 3. setup r1: pointer to pt_regs (SP) 4. drop down to pure kernel mode (from exception) 5. call the Linux "C" handler Remove the boiler plate code by moving #2, #3, #4 into #1. The exceptions to most exceptions are syscall Trap and Machine check which don't do some of above for various reasons, so call a newly introduced variant EXCEPTION_PROLOGUE_KEEP_AE (same as original EXCEPTION_PROLOGUE) Signed-off-by: Vineet Gupta <[email protected]>

Signed-off-by: Vineet Gupta <[email protected]>

… 64-bit [Fix link errors with -mcmodel=large and LINK_BASE=0xFFFF_0000_0000 #2] Current asm code handles ex_tables entries (PC markers) as 32-bits values, while they are actually 64-bits wide. | LD .tmp_vmlinux1 | arch/arc/kernel/unwind.o:(__ex_table+0x0): relocation truncated to fit: R_ARC_32 against `no symbol' | make[1]: *** [/home/vineetg/arc/v2-kernel/Makefile:1078: vmlinux] Error 1 Signed-off-by: Vineet Gupta <[email protected]>

With enabled CONFIG_DEBUG_PREEMPT the following stack-trace gets printed on SMP ARC platforms with performance counters interrupts: ---------------------------------------8------------------------------------- ARC perf : 8 counters (48 bits), 128 conditions, [overflow IRQ support] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 caller is arc_pmu_device_probe+0x336/0x3c8 CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.17.0 #2 Stack Trace: arc_unwind_core+0xe8/0x118 dump_stack_lvl+0x48/0x68 check_preemption_disabled+0xb4/0xb8 arc_pmu_device_probe+0x336/0x3c8 platform_probe+0x30/0x80 really_probe.part.0+0x88/0x240 driver_probe_device+0x86/0x1ec __driver_attach+0x8a/0x144 bus_for_each_dev+0x38/0x64 bus_add_driver+0x116/0x17c driver_register+0x4c/0xdc do_one_initcall+0x30/0x16c kernel_init_freeable+0x1be/0x224 workingset: timestamp_bits=14 max_order=15 bucket_order=1 io scheduler mq-deadline registered --------------------------------------8------------------------------------- That happens because of use of this_cpu_ptr() in https://elixir.bootlin.com/linux/v5.17/source/arch/arc/kernel/perf_event.c#L811 without explicitly disabled preemption, see https://www.kernel.org/doc/html/latest/core-api/this_cpu_ops.html#special-operations. In this commit we explicitly disable preemption around that code.

ARC: rwlock: remove unused variable

d75766a

Signed-off-by: Yuriy Kolerov <[email protected]>

kolerov assigned vineetgarc Aug 4, 2015

vineetgarc closed this Aug 4, 2015

vineetgarc deleted the ykolerov-unused-var branch August 4, 2015 11:48

alexhsu-petaio mentioned this pull request Mar 9, 2023

Fail to run eBPF example (minimal) of libbpf-bootstrap on linux 6.1.15, HS38 SMP #103

Closed

xxkent mentioned this pull request May 3, 2023

ARCv3: Fix backtrace message from arc_pmu_device_probe() #123

Merged

xxkent pushed a commit that referenced this pull request Aug 4, 2023

ARCv2: unbork #2: mmap is page offset btot byte offset

92f8d74

Signed-off-by: Vineet Gupta <[email protected]>

xxkent pushed a commit that referenced this pull request Aug 4, 2023

ARC: boot log: eliminate struct cpuinfo_arc #2: cache

565794e

Signed-off-by: Vineet Gupta <[email protected]>

xxkent pushed a commit that referenced this pull request Aug 4, 2023

ARC: int vs long #2 (but check for PAE)

5a99d38

Signed-off-by: Vineet Gupta <[email protected]>

xxkent pushed a commit that referenced this pull request Aug 4, 2023

ARCv3: MMUv4 enable hacks #2: pgd and pte are 64-bits

8add0ff

Signed-off-by: Vineet Gupta <[email protected]>

xxkent added a commit that referenced this pull request Oct 17, 2023

ARC: kprobe/kgdb helpers patch #2

4fb2ecd

Since major opcodes for ARCv2/ARCv2/ARCv3 are slightly different between ISAs, then we need to recognize instructions sizes differently depending on the architecture.

xxkent pushed a commit that referenced this pull request Oct 17, 2023

ARCv2: unbork #2: mmap is page offset btot byte offset

88c84bb

Signed-off-by: Vineet Gupta <[email protected]>

xxkent pushed a commit that referenced this pull request Oct 17, 2023

ARC: int vs long #2 (but check for PAE)

da5d34e

Signed-off-by: Vineet Gupta <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARC: rwlock: remove unused variable #2

ARC: rwlock: remove unused variable #2

Uh oh!

kolerov commented Aug 4, 2015

Uh oh!

vineetgarc commented Aug 4, 2015

Uh oh!

kolerov commented Aug 4, 2015

Uh oh!

Uh oh!

ARC: rwlock: remove unused variable #2

ARC: rwlock: remove unused variable #2

Uh oh!

Conversation

kolerov commented Aug 4, 2015

Uh oh!

vineetgarc commented Aug 4, 2015

Uh oh!

kolerov commented Aug 4, 2015

Uh oh!

Uh oh!