Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

kolerov
Copy link

@kolerov kolerov commented Aug 4, 2015

No description provided.

@vineetgarc
Copy link

Thx for the fix Yuriy - change pushed .
BTW what configuration are you trying to build - since the affected code would have enabled LLSC but !ARC_STAR_9000923308

@vineetgarc vineetgarc closed this Aug 4, 2015
@vineetgarc vineetgarc deleted the ykolerov-unused-var branch August 4, 2015 11:48
@kolerov
Copy link
Author

kolerov commented Aug 4, 2015

Thanks! I try to build allyesconfig for little endian ARC700.

noamc pushed a commit to Mellanox/linux that referenced this pull request Aug 12, 2015
In dev_queue_xmit() net_cls protected with rcu-bh.

[  270.730026] ===============================
[  270.730029] [ INFO: suspicious RCU usage. ]
[  270.730033] 4.2.0-rc3+ foss-for-synopsys-dwc-arc-processors#2 Not tainted
[  270.730036] -------------------------------
[  270.730040] include/linux/cgroup.h:353 suspicious rcu_dereference_check() usage!
[  270.730041] other info that might help us debug this:
[  270.730043] rcu_scheduler_active = 1, debug_locks = 1
[  270.730045] 2 locks held by dhclient/748:
[  270.730046]  #0:  (rcu_read_lock_bh){......}, at: [<ffffffff81682b70>] __dev_queue_xmit+0x50/0x960
[  270.730085]  foss-for-synopsys-dwc-arc-processors#1:  (&qdisc_tx_lock){+.....}, at: [<ffffffff81682d60>] __dev_queue_xmit+0x240/0x960
[  270.730090] stack backtrace:
[  270.730096] CPU: 0 PID: 748 Comm: dhclient Not tainted 4.2.0-rc3+ foss-for-synopsys-dwc-arc-processors#2
[  270.730098] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
[  270.730100]  0000000000000001 ffff8800bafeba58 ffffffff817ad487 0000000000000007
[  270.730103]  ffff880232a0a780 ffff8800bafeba88 ffffffff810ca4f2 ffff88022fb23e00
[  270.730105]  ffff880232a0a780 ffff8800bafebb68 ffff8800bafebb68 ffff8800bafebaa8
[  270.730108] Call Trace:
[  270.730121]  [<ffffffff817ad487>] dump_stack+0x4c/0x65
[  270.730148]  [<ffffffff810ca4f2>] lockdep_rcu_suspicious+0xe2/0x120
[  270.730153]  [<ffffffff816a62d2>] task_cls_state+0x92/0xa0
[  270.730158]  [<ffffffffa00b534f>] cls_cgroup_classify+0x4f/0x120 [cls_cgroup]
[  270.730164]  [<ffffffff816aac74>] tc_classify_compat+0x74/0xc0
[  270.730166]  [<ffffffff816ab573>] tc_classify+0x33/0x90
[  270.730170]  [<ffffffffa00bcb0a>] htb_enqueue+0xaa/0x4a0 [sch_htb]
[  270.730172]  [<ffffffff81682e26>] __dev_queue_xmit+0x306/0x960
[  270.730174]  [<ffffffff81682b70>] ? __dev_queue_xmit+0x50/0x960
[  270.730176]  [<ffffffff816834a3>] dev_queue_xmit_sk+0x13/0x20
[  270.730185]  [<ffffffff81787770>] dev_queue_xmit+0x10/0x20
[  270.730187]  [<ffffffff8178b91c>] packet_snd.isra.62+0x54c/0x760
[  270.730190]  [<ffffffff8178be25>] packet_sendmsg+0x2f5/0x3f0
[  270.730203]  [<ffffffff81665245>] ? sock_def_readable+0x5/0x190
[  270.730210]  [<ffffffff817b64bb>] ? _raw_spin_unlock+0x2b/0x40
[  270.730216]  [<ffffffff8173bcbc>] ? unix_dgram_sendmsg+0x5cc/0x640
[  270.730219]  [<ffffffff8165f367>] sock_sendmsg+0x47/0x50
[  270.730221]  [<ffffffff8165f42f>] sock_write_iter+0x7f/0xd0
[  270.730232]  [<ffffffff811fd4c7>] __vfs_write+0xa7/0xf0
[  270.730234]  [<ffffffff811fe5b8>] vfs_write+0xb8/0x190
[  270.730236]  [<ffffffff811fe8c2>] SyS_write+0x52/0xb0
[  270.730239]  [<ffffffff817b6bae>] entry_SYSCALL_64_fastpath+0x12/0x76

Signed-off-by: Konstantin Khlebnikov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
noamc pushed a commit to Mellanox/linux that referenced this pull request Aug 12, 2015
The cgroup attaches inode->i_wb via mark_inode_dirty and when set_page_writeback
is called, __inc_wb_stat() updates i_wb's stat.

So, we need to explicitly call set_page_dirty->__mark_inode_dirty in prior to
any writebacking pages.

This patch should resolve the following kernel panic reported by Andreas Reis.

https://bugzilla.kernel.org/show_bug.cgi?id=101801

--- Comment foss-for-synopsys-dwc-arc-processors#2 from Andreas Reis <[email protected]> ---
BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
IP: [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
PGD 2951ff067 PUD 2df43f067 PMD 0
Oops: 0000 [foss-for-synopsys-dwc-arc-processors#1] PREEMPT SMP
Modules linked in:
CPU: 7 PID: 10356 Comm: gcc Tainted: G        W       4.2.0-1-cu foss-for-synopsys-dwc-arc-processors#1
Hardware name: Gigabyte Technology Co., Ltd. G1.Sniper M5/G1.Sniper M5, BIOS
T01 02/03/2015
task: ffff880295044f80 ti: ffff880295140000 task.ti: ffff880295140000
RIP: 0010:[<ffffffff8149deea>]  [<ffffffff8149deea>]
__percpu_counter_add+0x1a/0x90
RSP: 0018:ffff880295143ac8  EFLAGS: 00010082
RAX: 0000000000000003 RBX: ffffea000a526d40 RCX: 0000000000000001
RDX: 0000000000000020 RSI: 0000000000000001 RDI: 0000000000000088
RBP: ffff880295143ae8 R08: 0000000000000000 R09: ffff88008f69bb30
R10: 00000000fffffffa R11: 0000000000000000 R12: 0000000000000088
R13: 0000000000000001 R14: ffff88041d099000 R15: ffff880084a205d0
FS:  00007f8549374700(0000) GS:ffff88042f3c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a8 CR3: 000000033e1d5000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 0000000000000000 ffffea000a526d40 ffff880084a20738 ffff880084a20750
 ffff880295143b48 ffffffff811cc91e ffff880000000000 0000000000000296
 0000000000000000 ffff880417090198 0000000000000000 ffffea000a526d40
Call Trace:
 [<ffffffff811cc91e>] __test_set_page_writeback+0xde/0x1d0
 [<ffffffff813fee87>] do_write_data_page+0xe7/0x3a0
 [<ffffffff813faeea>] gc_data_segment+0x5aa/0x640
 [<ffffffff813fb0b8>] do_garbage_collect+0x138/0x150
 [<ffffffff813fb3fe>] f2fs_gc+0x1be/0x3e0
 [<ffffffff81405541>] f2fs_balance_fs+0x81/0x90
 [<ffffffff813ee357>] f2fs_unlink+0x47/0x1d0
 [<ffffffff81239329>] vfs_unlink+0x109/0x1b0
 [<ffffffff8123e3d7>] do_unlinkat+0x287/0x2c0
 [<ffffffff8123ebc6>] SyS_unlink+0x16/0x20
 [<ffffffff81942e2e>] entry_SYSCALL_64_fastpath+0x12/0x71
Code: 41 5e 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 49
89 f5 41 54 49 89 fc 53 48 83 ec 08 65 ff 05 e6 d9 b6 7e <48> 8b 47 20 48 63 ca
65 8b 18 48 63 db 48 01 f3 48 39 cb 7d 0a
RIP  [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
 RSP <ffff880295143ac8>
CR2: 00000000000000a8
---[ end trace 5132449a58ed93a3 ]---
note: gcc[10356] exited with preempt_count 2

Signed-off-by: Jaegeuk Kim <[email protected]>
noamc pushed a commit to Mellanox/linux that referenced this pull request Aug 12, 2015
Hayes Wang says:

====================
r8152: issues fix

v2:
Replace patch foss-for-synopsys-dwc-arc-processors#2 with "r8152: fix wakeup settings".

v1:
These patches are used to fix issues.
====================

Signed-off-by: David S. Miller <[email protected]>
noamc pushed a commit to Mellanox/linux that referenced this pull request Aug 12, 2015
Hayes Wang says:

====================
r8152: device reset

v3:
For patch foss-for-synopsys-dwc-arc-processors#2, remove cancel_delayed_work().

v2:
For patch foss-for-synopsys-dwc-arc-processors#1, remove usb_autopm_get_interface(), usb_autopm_put_interface(), and
the checking of intf->condition.

For patch foss-for-synopsys-dwc-arc-processors#2, replace the original method with usb_queue_reset_device() to reset
the device.

v1:
Although the driver works normally, we find the device may get all 0xff data when
transmitting packets on certain platforms. It would break the device and no packet
could be transmitted. The reset is necessary to recover the hw for this situation.
====================

Signed-off-by: David S. Miller <[email protected]>
noamc pushed a commit to Mellanox/linux that referenced this pull request Aug 12, 2015
Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  foss-for-synopsys-dwc-arc-processors#1 schedule at ffffffff815ab76e
  foss-for-synopsys-dwc-arc-processors#2 schedule_timeout at ffffffff815ae5e5
  foss-for-synopsys-dwc-arc-processors#3 io_schedule_timeout at ffffffff815aad6a
  foss-for-synopsys-dwc-arc-processors#4 bit_wait_io at ffffffff815abfc6
  foss-for-synopsys-dwc-arc-processors#5 __wait_on_bit at ffffffff815abda5
  torvalds#6 wait_on_page_bit at ffffffff8111fd4f
  foss-for-synopsys-dwc-arc-processors#7 shrink_page_list at ffffffff81135445
  foss-for-synopsys-dwc-arc-processors#8 shrink_inactive_list at ffffffff81135845
  foss-for-synopsys-dwc-arc-processors#9 shrink_lruvec at ffffffff81135ead
 foss-for-synopsys-dwc-arc-processors#10 shrink_zone at ffffffff811360c3
 foss-for-synopsys-dwc-arc-processors#11 shrink_zones at ffffffff81136eff
 foss-for-synopsys-dwc-arc-processors#12 do_try_to_free_pages at ffffffff8113712f
 foss-for-synopsys-dwc-arc-processors#13 try_to_free_mem_cgroup_pages at ffffffff811372be
 foss-for-synopsys-dwc-arc-processors#14 try_charge at ffffffff81189423
 foss-for-synopsys-dwc-arc-processors#15 mem_cgroup_try_charge at ffffffff8118c6f5
 foss-for-synopsys-dwc-arc-processors#16 __add_to_page_cache_locked at ffffffff8112137d
 foss-for-synopsys-dwc-arc-processors#17 add_to_page_cache_lru at ffffffff81121618
 foss-for-synopsys-dwc-arc-processors#18 pagecache_get_page at ffffffff8112170b
 foss-for-synopsys-dwc-arc-processors#19 grow_dev_page at ffffffff811c8297
 foss-for-synopsys-dwc-arc-processors#20 __getblk_slow at ffffffff811c91d6
 foss-for-synopsys-dwc-arc-processors#21 __getblk_gfp at ffffffff811c92c1
 foss-for-synopsys-dwc-arc-processors#22 ext4_ext_grow_indepth at ffffffff8124565c
 foss-for-synopsys-dwc-arc-processors#23 ext4_ext_create_new_leaf at ffffffff81246ca8
 foss-for-synopsys-dwc-arc-processors#24 ext4_ext_insert_extent at ffffffff81246f09
 torvalds#25 ext4_ext_map_blocks at ffffffff8124a848
 foss-for-synopsys-dwc-arc-processors#26 ext4_map_blocks at ffffffff8121a5b7
 torvalds#27 mpage_map_one_extent at ffffffff8121b1fa
 torvalds#28 mpage_map_and_submit_extent at ffffffff8121f07b
 foss-for-synopsys-dwc-arc-processors#29 ext4_writepages at ffffffff8121f6d5
 foss-for-synopsys-dwc-arc-processors#30 do_writepages at ffffffff8112c490
 foss-for-synopsys-dwc-arc-processors#31 __filemap_fdatawrite_range at ffffffff81120199
 foss-for-synopsys-dwc-arc-processors#32 filemap_flush at ffffffff8112041c
 torvalds#33 ext4_alloc_da_blocks at ffffffff81219da1
 foss-for-synopsys-dwc-arc-processors#34 ext4_rename at ffffffff81229b91
 torvalds#35 ext4_rename2 at ffffffff81229e32
 foss-for-synopsys-dwc-arc-processors#36 vfs_rename at ffffffff811a08a5
 foss-for-synopsys-dwc-arc-processors#37 SYSC_renameat2 at ffffffff811a3ffc
 foss-for-synopsys-dwc-arc-processors#38 sys_renameat2 at ffffffff811a408e
 foss-for-synopsys-dwc-arc-processors#39 sys_rename at ffffffff8119e51e
 foss-for-synopsys-dwc-arc-processors#40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: [email protected] # 3.9+
[[email protected]: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
noamc pushed a commit to Mellanox/linux that referenced this pull request Aug 12, 2015
The shm implementation internally uses shmem or hugetlbfs inodes for shm
segments.  As these inodes are never directly exposed to userspace and
only accessed through the shm operations which are already hooked by
security modules, mark the inodes with the S_PRIVATE flag so that inode
security initialization and permission checking is skipped.

This was motivated by the following lockdep warning:

  ======================================================
   [ INFO: possible circular locking dependency detected ]
   4.2.0-0.rc3.git0.1.fc24.x86_64+debug foss-for-synopsys-dwc-arc-processors#1 Tainted: G        W
  -------------------------------------------------------
   httpd/1597 is trying to acquire lock:
   (&ids->rwsem){+++++.}, at: shm_close+0x34/0x130
   but task is already holding lock:
   (&mm->mmap_sem){++++++}, at: SyS_shmdt+0x4b/0x180
   which lock already depends on the new lock.
   the existing dependency chain (in reverse order) is:
   -> foss-for-synopsys-dwc-arc-processors#3 (&mm->mmap_sem){++++++}:
        lock_acquire+0xc7/0x270
        __might_fault+0x7a/0xa0
        filldir+0x9e/0x130
        xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
        xfs_readdir+0x1b4/0x330 [xfs]
        xfs_file_readdir+0x2b/0x30 [xfs]
        iterate_dir+0x97/0x130
        SyS_getdents+0x91/0x120
        entry_SYSCALL_64_fastpath+0x12/0x76
   -> foss-for-synopsys-dwc-arc-processors#2 (&xfs_dir_ilock_class){++++.+}:
        lock_acquire+0xc7/0x270
        down_read_nested+0x57/0xa0
        xfs_ilock+0x167/0x350 [xfs]
        xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
        xfs_attr_get+0xbd/0x190 [xfs]
        xfs_xattr_get+0x3d/0x70 [xfs]
        generic_getxattr+0x4f/0x70
        inode_doinit_with_dentry+0x162/0x670
        sb_finish_set_opts+0xd9/0x230
        selinux_set_mnt_opts+0x35c/0x660
        superblock_doinit+0x77/0xf0
        delayed_superblock_init+0x10/0x20
        iterate_supers+0xb3/0x110
        selinux_complete_init+0x2f/0x40
        security_load_policy+0x103/0x600
        sel_write_load+0xc1/0x750
        __vfs_write+0x37/0x100
        vfs_write+0xa9/0x1a0
        SyS_write+0x58/0xd0
        entry_SYSCALL_64_fastpath+0x12/0x76
  ...

Signed-off-by: Stephen Smalley <[email protected]>
Reported-by: Morten Stevens <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Acked-by: Paul Moore <[email protected]>
Cc: Manfred Spraul <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Eric Paris <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
noamc pushed a commit to Mellanox/linux that referenced this pull request Aug 27, 2015
It turns out that a PV domU also requires the "Xen PV" APIC
driver. Otherwise, the flat driver is used and we get stuck in busy
loops that never exit, such as in this stack trace:

(gdb) target remote localhost:9999
Remote debugging using localhost:9999
__xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
56              while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
(gdb) bt
 #0  __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
 foss-for-synopsys-dwc-arc-processors#1  __default_send_IPI_shortcut (shortcut=<optimized out>,
dest=<optimized out>, vector=<optimized out>) at
./arch/x86/include/asm/ipi.h:75
 foss-for-synopsys-dwc-arc-processors#2  apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54
 foss-for-synopsys-dwc-arc-processors#3  0xffffffff81011336 in arch_irq_work_raise () at
arch/x86/kernel/irq_work.c:47
 foss-for-synopsys-dwc-arc-processors#4  0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at
kernel/irq_work.c:100
 foss-for-synopsys-dwc-arc-processors#5  0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633
 torvalds#6  0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized
out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>,
fmt=<optimized out>, args=<optimized out>)
    at kernel/printk/printk.c:1778
 foss-for-synopsys-dwc-arc-processors#7  0xffffffff816010c8 in printk (fmt=<optimized out>) at
kernel/printk/printk.c:1868
 foss-for-synopsys-dwc-arc-processors#8  0xffffffffc00013ea in ?? ()
 foss-for-synopsys-dwc-arc-processors#9  0x0000000000000000 in ?? ()

Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755
Signed-off-by: Jason A. Donenfeld <[email protected]>
Cc: <[email protected]>
Signed-off-by: David Vrabel <[email protected]>
noamc pushed a commit to Mellanox/linux that referenced this pull request Aug 27, 2015
This patch is based on the upstream commit 5ac1c4b and amended
for v4.2 to make sure it works as intended.

Repeated calls to begin_crtc_commit can cause warnings like this:
[  169.127746] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:616
[  169.127835] in_atomic(): 0, irqs_disabled(): 1, pid: 1947, name: kms_flip
[  169.127840] 3 locks held by kms_flip/1947:
[  169.127843]  #0:  (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffff814774bc>] __drm_modeset_lock_all+0x9c/0x130
[  169.127860]  foss-for-synopsys-dwc-arc-processors#1:  (crtc_ww_class_acquire){+.+.+.}, at: [<ffffffff814774cd>] __drm_modeset_lock_all+0xad/0x130
[  169.127870]  foss-for-synopsys-dwc-arc-processors#2:  (crtc_ww_class_mutex){+.+.+.}, at: [<ffffffff81477178>] drm_modeset_lock+0x38/0x110
[  169.127879] irq event stamp: 665690
[  169.127882] hardirqs last  enabled at (665689): [<ffffffff817ffdb5>] _raw_spin_unlock_irqrestore+0x55/0x70
[  169.127889] hardirqs last disabled at (665690): [<ffffffffc0197a23>] intel_pipe_update_start+0x113/0x5c0 [i915]
[  169.127936] softirqs last  enabled at (665470): [<ffffffff8108a766>] __do_softirq+0x236/0x650
[  169.127942] softirqs last disabled at (665465): [<ffffffff8108ae75>] irq_exit+0xc5/0xd0
[  169.127951] CPU: 1 PID: 1947 Comm: kms_flip Not tainted 4.1.0-rc4-patser+ #4039
[  169.127954] Hardware name: LENOVO 2349AV8/2349AV8, BIOS G1ETA5WW (2.65 ) 04/15/2014
[  169.127957]  ffff8800c49036f0 ffff8800cde5fa28 ffffffff817f6907 0000000080000001
[  169.127964]  0000000000000000 ffff8800cde5fa58 ffffffff810aebed 0000000000000046
[  169.127970]  ffffffff81c5d518 0000000000000268 0000000000000000 ffff8800cde5fa88
[  169.127981] Call Trace:
[  169.127992]  [<ffffffff817f6907>] dump_stack+0x4f/0x7b
[  169.128001]  [<ffffffff810aebed>] ___might_sleep+0x16d/0x270
[  169.128008]  [<ffffffff810aed38>] __might_sleep+0x48/0x90
[  169.128017]  [<ffffffff817fc359>] mutex_lock_nested+0x29/0x410
[  169.128073]  [<ffffffffc01635f0>] ? vgpu_write64+0x220/0x220 [i915]
[  169.128138]  [<ffffffffc017fddf>] ? ironlake_update_primary_plane+0x2ff/0x410 [i915]
[  169.128198]  [<ffffffffc0190e75>] intel_frontbuffer_flush+0x25/0x70 [i915]
[  169.128253]  [<ffffffffc01831ac>] intel_finish_crtc_commit+0x4c/0x180 [i915]
[  169.128279]  [<ffffffffc00784ac>] drm_atomic_helper_commit_planes+0x12c/0x240 [drm_kms_helper]
[  169.128338]  [<ffffffffc0184264>] __intel_set_mode+0x684/0x830 [i915]
[  169.128378]  [<ffffffffc018a84a>] intel_crtc_set_config+0x49a/0x620 [i915]
[  169.128385]  [<ffffffff817fdd39>] ? mutex_unlock+0x9/0x10
[  169.128391]  [<ffffffff81467b69>] drm_mode_set_config_internal+0x69/0x120
[  169.128398]  [<ffffffff8119b547>] ? might_fault+0x57/0xb0
[  169.128403]  [<ffffffff8146bf93>] drm_mode_setcrtc+0x253/0x620
[  169.128409]  [<ffffffff8145c600>] drm_ioctl+0x1a0/0x6a0
[  169.128415]  [<ffffffff810b3b41>] ? get_parent_ip+0x11/0x50
[  169.128424]  [<ffffffff811e9ab8>] do_vfs_ioctl+0x2f8/0x530
[  169.128429]  [<ffffffff810d0fcd>] ? trace_hardirqs_on+0xd/0x10
[  169.128435]  [<ffffffff812e7676>] ? selinux_file_ioctl+0x56/0x100
[  169.128439]  [<ffffffff811e9d71>] SyS_ioctl+0x81/0xa0
[  169.128445]  [<ffffffff81800697>] system_call_fastpath+0x12/0x6f

Solve it by using the newly introduced drm_atomic_helper_commit_planes_on_crtc.

The problem here was that the drm_atomic_helper_commit_planes() helper
we were using was basically designed to do

    begin_crtc_commit(crtc foss-for-synopsys-dwc-arc-processors#1)
    begin_crtc_commit(crtc foss-for-synopsys-dwc-arc-processors#2)
    ...
    commit all planes
    finish_crtc_commit(crtc foss-for-synopsys-dwc-arc-processors#1)
    finish_crtc_commit(crtc foss-for-synopsys-dwc-arc-processors#2)

The problem here is that since our hardware relies on vblank evasion,
our CRTC 'begin' function waits until we're out of the danger zone in
which register writes might wind up straddling the vblank, then disables
interrupts; our 'finish' function re-enables interrupts after the
registers have been written.  The expectation is that the operations between
'begin' and 'end' must be performed without sleeping (since interrupts
are disabled) and should happen as quickly as possible.  By clumping all
of the 'begin' calls together, we introducing a couple problems:
 * Subsequent 'begin' invocations might sleep (which is illegal)
 * The first 'begin' ensured that we were far enough from the vblank that
   we could write our registers safely and ensure they all fell within
   the same frame.  Adding extra delay waiting for subsequent CRTC's
   wasn't accounted for and could put us back into the 'danger zone' for
   CRTC foss-for-synopsys-dwc-arc-processors#1.

This commit solves the problem by using a new helper that allows an
order of operations like:

   for each crtc {
        begin_crtc_commit(crtc)  // sleep (maybe), then disable interrupts
        commit planes for this specific CRTC
        end_crtc_commit(crtc)    // reenable interrupts
   }

so that sleeps will only be performed while interrupts are enabled and
we can be sure that registers for a CRTC will be written immediately
once we know we're in the safe zone.

The crtc->config->base.crtc update may seem unrelated, but the helper
will use it to obtain the crtc for the state. Without the update it
will dereference NULL and crash.

Changes since v1:
- Use Matt Roper's commit message.

Signed-off-by: Maarten Lankhorst <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
References: https://bugs.freedesktop.org/show_bug.cgi?id=90398
Reviewed-by: Ander Conselvan de Oliveira <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
noamc pushed a commit to Mellanox/linux that referenced this pull request Aug 27, 2015
A recent change to the cpu_cooling code introduced a AB-BA deadlock
scenario between the cpufreq_policy_notifier_list rwsem and the
cooling_cpufreq_lock.  This is caused by cooling_cpufreq_lock being held
before the registration/removal of the notifier block (an operation
which takes the rwsem), and the notifier code itself which takes the
locks in the reverse order:

======================================================
[ INFO: possible circular locking dependency detected ]
3.18.0+ #1453 Not tainted
-------------------------------------------------------
rc.local/770 is trying to acquire lock:
 (cooling_cpufreq_lock){+.+.+.}, at: [<c04abfc4>] cpufreq_thermal_notifier+0x34/0xfc

but task is already holding lock:
 ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [<c0042f04>]  __blocking_notifier_call_chain+0x34/0x68

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> foss-for-synopsys-dwc-arc-processors#1 ((cpufreq_policy_notifier_list).rwsem){++++.+}:
       [<c06bc3b0>] down_write+0x44/0x9c
       [<c0043444>] blocking_notifier_chain_register+0x28/0xd8
       [<c04ad610>] cpufreq_register_notifier+0x68/0x90
       [<c04abe4c>] __cpufreq_cooling_register.part.1+0x120/0x180
       [<c04abf44>] __cpufreq_cooling_register+0x98/0xa4
       [<c04abf8c>] cpufreq_cooling_register+0x18/0x1c
       [<bf0046f8>] imx_thermal_probe+0x1c0/0x470 [imx_thermal]
       [<c037cef8>] platform_drv_probe+0x50/0xac
       [<c037b710>] driver_probe_device+0x114/0x234
       [<c037b8cc>] __driver_attach+0x9c/0xa0
       [<c0379d68>] bus_for_each_dev+0x5c/0x90
       [<c037b204>] driver_attach+0x24/0x28
       [<c037ae7c>] bus_add_driver+0xe0/0x1d8
       [<c037c0cc>] driver_register+0x80/0xfc
       [<c037cd80>] __platform_driver_register+0x50/0x64
       [<bf007018>] 0xbf007018
       [<c0008a5c>] do_one_initcall+0x88/0x1d8
       [<c0095da4>] load_module+0x1768/0x1ef8
       [<c0096614>] SyS_init_module+0xe0/0xf4
       [<c000ec00>] ret_fast_syscall+0x0/0x48

-> #0 (cooling_cpufreq_lock){+.+.+.}:
       [<c00619f8>] lock_acquire+0xb0/0x124
       [<c06ba3b4>] mutex_lock_nested+0x5c/0x3d8
       [<c04abfc4>] cpufreq_thermal_notifier+0x34/0xfc
       [<c0042bf4>] notifier_call_chain+0x4c/0x8c
       [<c0042f20>] __blocking_notifier_call_chain+0x50/0x68
       [<c0042f58>] blocking_notifier_call_chain+0x20/0x28
       [<c04ae62c>] cpufreq_set_policy+0x7c/0x1d0
       [<c04af3cc>] store_scaling_governor+0x74/0x9c
       [<c04ad418>] store+0x90/0xc0
       [<c0175384>] sysfs_kf_write+0x54/0x58
       [<c01746b4>] kernfs_fop_write+0xdc/0x190
       [<c010dcc0>] vfs_write+0xac/0x1b4
       [<c010dfec>] SyS_write+0x44/0x90
       [<c000ec00>] ret_fast_syscall+0x0/0x48

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((cpufreq_policy_notifier_list).rwsem);
                               lock(cooling_cpufreq_lock);
                               lock((cpufreq_policy_notifier_list).rwsem);
  lock(cooling_cpufreq_lock);

 *** DEADLOCK ***

7 locks held by rc.local/770:
 #0:  (sb_writers#6){.+.+.+}, at: [<c010dda0>] vfs_write+0x18c/0x1b4
 foss-for-synopsys-dwc-arc-processors#1:  (&of->mutex){+.+.+.}, at: [<c0174678>] kernfs_fop_write+0xa0/0x190
 foss-for-synopsys-dwc-arc-processors#2:  (s_active#52){.+.+.+}, at: [<c0174680>] kernfs_fop_write+0xa8/0x190
 foss-for-synopsys-dwc-arc-processors#3:  (cpu_hotplug.lock){++++++}, at: [<c0026a60>] get_online_cpus+0x34/0x90
 foss-for-synopsys-dwc-arc-processors#4:  (cpufreq_rwsem){.+.+.+}, at: [<c04ad3e0>] store+0x58/0xc0
 foss-for-synopsys-dwc-arc-processors#5:  (&policy->rwsem){+.+.+.}, at: [<c04ad3f8>] store+0x70/0xc0
 torvalds#6:  ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [<c0042f04>] __blocking_notifier_call_chain+0x34/0x68

stack backtrace:
CPU: 0 PID: 770 Comm: rc.local Not tainted 3.18.0+ #1453
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[<c00121c8>] (dump_backtrace) from [<c0012360>] (show_stack+0x18/0x1c)
 r6:c0b85a80 r5:c0b75630 r4:00000000 r3:00000000
[<c0012348>] (show_stack) from [<c06b6c48>] (dump_stack+0x7c/0x98)
[<c06b6bcc>] (dump_stack) from [<c06b42a4>] (print_circular_bug+0x28c/0x2d8)
 r4:c0b85a80 r3:d0071d40
[<c06b4018>] (print_circular_bug) from [<c00613b0>] (__lock_acquire+0x1acc/0x1bb0)
 r10:c0b50660 r8:c09e6d80 r7:d0071d40 r6:c11d0f0c r5:00000007 r4:d0072240
[<c005f8e4>] (__lock_acquire) from [<c00619f8>] (lock_acquire+0xb0/0x124)
 r10:00000000 r9:c04abfc4 r8:00000000 r7:00000000 r6:00000000 r5:c0a06f0c
 r4:00000000
[<c0061948>] (lock_acquire) from [<c06ba3b4>] (mutex_lock_nested+0x5c/0x3d8)
 r10:ec853800 r9:c0a06ed4 r8:d0071d40 r7:c0a06ed4 r6:c11d0f0c r5:00000000
 r4:c04abfc4
[<c06ba358>] (mutex_lock_nested) from [<c04abfc4>] (cpufreq_thermal_notifier+0x34/0xfc)
 r10:ec853800 r9:ec85380c r8:d00d7d3c r7:c0a06ed4 r6:d00d7d3c r5:00000000
 r4:fffffffe
[<c04abf90>] (cpufreq_thermal_notifier) from [<c0042bf4>] (notifier_call_chain+0x4c/0x8c)
 r7:00000000 r6:00000000 r5:00000000 r4:fffffffe
[<c0042ba8>] (notifier_call_chain) from [<c0042f20>] (__blocking_notifier_call_chain+0x50/0x68)
 r8:c0a072a4 r7:00000000 r6:d00d7d3c r5:ffffffff r4:c0a06fc8 r3:ffffffff
[<c0042ed0>] (__blocking_notifier_call_chain) from [<c0042f58>] (blocking_notifier_call_chain+0x20/0x28)
 r7:ec98b540 r6:c13ebc80 r5:ed76e600 r4:d00d7d3c
[<c0042f38>] (blocking_notifier_call_chain) from [<c04ae62c>] (cpufreq_set_policy+0x7c/0x1d0)
[<c04ae5b0>] (cpufreq_set_policy) from [<c04af3cc>] (store_scaling_governor+0x74/0x9c)
 r7:ec98b540 r6:0000000c r5:ec98b540 r4:ed76e600
[<c04af358>] (store_scaling_governor) from [<c04ad418>] (store+0x90/0xc0)
 r6:0000000c r5:ed76e6d4 r4:ed76e600
[<c04ad388>] (store) from [<c0175384>] (sysfs_kf_write+0x54/0x58)
 r8:0000000c r7:d00d7f78 r6:ec98b540 r5:0000000c r4:ec853800 r3:0000000c
[<c0175330>] (sysfs_kf_write) from [<c01746b4>] (kernfs_fop_write+0xdc/0x190)
 r6:ec98b540 r5:00000000 r4:00000000 r3:c0175330
[<c01745d8>] (kernfs_fop_write) from [<c010dcc0>] (vfs_write+0xac/0x1b4)
 r10:0162aa70 r9:d00d6000 r8:0000000c r7:d00d7f78 r6:0162aa70 r5:0000000c
 r4:eccde500
[<c010dc14>] (vfs_write) from [<c010dfec>] (SyS_write+0x44/0x90)
 r10:0162aa70 r8:0000000c r7:eccde500 r6:eccde500 r5:00000000 r4:00000000
[<c010dfa8>] (SyS_write) from [<c000ec00>] (ret_fast_syscall+0x0/0x48)
 r10:00000000 r8:c000edc4 r7:00000004 r6:000216cc r5:0000000c r4:0162aa70

Solve this by moving to finer grained locking - use one mutex to protect
the cpufreq_dev_list as a whole, and a separate lock to ensure correct
ordering of cpufreq notifier registration and removal.

cooling_list_lock is taken within cooling_cpufreq_lock on
(un)registration to preserve the behavior of the code, i.e. to
atomically add/remove to the list and (un)register the notifier.

Fixes: 2dcd851 ("thermal: cpu_cooling: Update always cpufreq policy with
Reviewed-by: Viresh Kumar <[email protected]>
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
Signed-off-by: Eduardo Valentin <[email protected]>
noamc pushed a commit to Mellanox/linux that referenced this pull request Oct 18, 2015
My colleague ran into a program stall on a x86_64 server, where
n_tty_read() was waiting for data even if there was data in the buffer
in the pty.  kernel stack for the stuck process looks like below.
 #0 [ffff88303d107b58] __schedule at ffffffff815c4b20
 foss-for-synopsys-dwc-arc-processors#1 [ffff88303d107bd0] schedule at ffffffff815c513e
 foss-for-synopsys-dwc-arc-processors#2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818
 foss-for-synopsys-dwc-arc-processors#3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2
 foss-for-synopsys-dwc-arc-processors#4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23
 foss-for-synopsys-dwc-arc-processors#5 [ffff88303d107dd0] tty_read at ffffffff81368013
 torvalds#6 [ffff88303d107e20] __vfs_read at ffffffff811a3704
 foss-for-synopsys-dwc-arc-processors#7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57
 foss-for-synopsys-dwc-arc-processors#8 [ffff88303d107f00] sys_read at ffffffff811a4306
 foss-for-synopsys-dwc-arc-processors#9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7

There seems to be two problems causing this issue.

First, in drivers/tty/n_tty.c, __receive_buf() stores the data and
updates ldata->commit_head using smp_store_release() and then checks
the wait queue using waitqueue_active().  However, since there is no
memory barrier, __receive_buf() could return without calling
wake_up_interactive_poll(), and at the same time, n_tty_read() could
start to wait in wait_woken() as in the following chart.

        __receive_buf()                         n_tty_read()
------------------------------------------------------------------------
if (waitqueue_active(&tty->read_wait))
/* Memory operations issued after the
   RELEASE may be completed before the
   RELEASE operation has completed */
                                        add_wait_queue(&tty->read_wait, &wait);
                                        ...
                                        if (!input_available_p(tty, 0)) {
smp_store_release(&ldata->commit_head,
                  ldata->read_head);
                                        ...
                                        timeout = wait_woken(&wait,
                                          TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------

The second problem is that n_tty_read() also lacks a memory barrier
call and could also cause __receive_buf() to return without calling
wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken()
as in the chart below.

        __receive_buf()                         n_tty_read()
------------------------------------------------------------------------
                                        spin_lock_irqsave(&q->lock, flags);
                                        /* from add_wait_queue() */
                                        ...
                                        if (!input_available_p(tty, 0)) {
                                        /* Memory operations issued after the
                                           RELEASE may be completed before the
                                           RELEASE operation has completed */
smp_store_release(&ldata->commit_head,
                  ldata->read_head);
if (waitqueue_active(&tty->read_wait))
                                        __add_wait_queue(q, wait);
                                        spin_unlock_irqrestore(&q->lock,flags);
                                        /* from add_wait_queue() */
                                        ...
                                        timeout = wait_woken(&wait,
                                          TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------

There are also other places in drivers/tty/n_tty.c which have similar
calls to waitqueue_active(), so instead of adding many memory barrier
calls, this patch simply removes the call to waitqueue_active(),
leaving just wake_up*() behind.

This fixes both problems because, even though the memory access before
or after the spinlocks in both wake_up*() and add_wait_queue() can
sneak into the critical section, it cannot go past it and the critical
section assures that they will be serialized (please see "INTER-CPU
ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a
better explanation).  Moreover, the resulting code is much simpler.

Latency measurement using a ping-pong test over a pty doesn't show any
visible performance drop.

Signed-off-by: Kosuke Tatsukawa <[email protected]>
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
noamc pushed a commit to Mellanox/linux that referenced this pull request Oct 18, 2015
When running kprobe test on arm64 rt kernel, it reports the below warning:

root@qemu7:~# modprobe kprobe_example
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
in_atomic(): 0, irqs_disabled(): 128, pid: 484, name: modprobe
CPU: 0 PID: 484 Comm: modprobe Not tainted 4.1.6-rt5 foss-for-synopsys-dwc-arc-processors#2
Hardware name: linux,dummy-virt (DT)
Call trace:
[<ffffffc0000891b8>] dump_backtrace+0x0/0x128
[<ffffffc000089300>] show_stack+0x20/0x30
[<ffffffc00061dae8>] dump_stack+0x1c/0x28
[<ffffffc0000bbad0>] ___might_sleep+0x120/0x198
[<ffffffc0006223e8>] rt_spin_lock+0x28/0x40
[<ffffffc000622b30>] __aarch64_insn_write+0x28/0x78
[<ffffffc000622e48>] aarch64_insn_patch_text_nosync+0x18/0x48
[<ffffffc000622ee8>] aarch64_insn_patch_text_cb+0x70/0xa0
[<ffffffc000622f40>] aarch64_insn_patch_text_sync+0x28/0x48
[<ffffffc0006236e0>] arch_arm_kprobe+0x38/0x48
[<ffffffc00010e6f4>] arm_kprobe+0x34/0x50
[<ffffffc000110374>] register_kprobe+0x4cc/0x5b8
[<ffffffbffc002038>] kprobe_init+0x38/0x7c [kprobe_example]
[<ffffffc000084240>] do_one_initcall+0x90/0x1b0
[<ffffffc00061c498>] do_init_module+0x6c/0x1cc
[<ffffffc0000fd0c0>] load_module+0x17f8/0x1db0
[<ffffffc0000fd8cc>] SyS_finit_module+0xb4/0xc8

Convert patch_lock to raw loc kto avoid this issue.

Although the problem is found on rt kernel, the fix should be applicable to
mainline kernel too.

Acked-by: Steven Rostedt <[email protected]>
Signed-off-by: Yang Shi <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
noamc pushed a commit to Mellanox/linux that referenced this pull request Oct 18, 2015
When freqdomain_cpus attribute is read from an offlined cpu, it will
cause crash. This change prevents calling cpufreq_show_cpus when
policy driver_data is NULL.

Crash info:

[  170.814949] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[  170.814990] IP: [<ffffffff813b2490>] _find_next_bit.part.0+0x10/0x70
[  170.815021] PGD 227d30067 PUD 229e56067 PMD 0
[  170.815043] Oops: 0000 [foss-for-synopsys-dwc-arc-processors#2] SMP
[  170.816022] CPU: 3 PID: 3121 Comm: cat Tainted: G      D    OE   4.3.0-rc3+ torvalds#33
...
...
[  170.816657] Call Trace:
[  170.816672]  [<ffffffff813b2505>] ? find_next_bit+0x15/0x20
[  170.816696]  [<ffffffff8160e47c>] cpufreq_show_cpus+0x5c/0xd0
[  170.816722]  [<ffffffffa031a409>] show_freqdomain_cpus+0x19/0x20 [acpi_cpufreq]
[  170.816749]  [<ffffffff8160e65b>] show+0x3b/0x60
[  170.816769]  [<ffffffff8129b31c>] sysfs_kf_seq_show+0xbc/0x130
[  170.816793]  [<ffffffff81299be3>] kernfs_seq_show+0x23/0x30
[  170.816816]  [<ffffffff81240f2c>] seq_read+0xec/0x390
[  170.816837]  [<ffffffff8129a64a>] kernfs_fop_read+0x10a/0x160
[  170.816861]  [<ffffffff8121d9b7>] __vfs_read+0x37/0x100
[  170.816883]  [<ffffffff813217c0>] ? security_file_permission+0xa0/0xc0
[  170.816909]  [<ffffffff8121e2e3>] vfs_read+0x83/0x130
[  170.816930]  [<ffffffff8121f035>] SyS_read+0x55/0xc0
...
...
[  170.817185] ---[ end trace bc6eadf82b2b965a ]---

Signed-off-by: Srinivas Pandruvada <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Cc: 4.2+ <[email protected]> # 4.2+
Signed-off-by: Rafael J. Wysocki <[email protected]>
noamc pushed a commit to Mellanox/linux that referenced this pull request Oct 31, 2015
…cessors#2

Explicit'ify that all memory added so far is low memory
Nothing semantical

Signed-off-by: Vineet Gupta <[email protected]>
anthony-kolesov pushed a commit that referenced this pull request Dec 14, 2015
Saeed Mahameed says:

====================
mlx5 improved flow steering management

First two patches fixes some minor issues in recently
introduced SRIOV code.

The other seven patches modifies the driver's code that
manages flow steering rules with Connectx-4 devices.

Basic introduction:

The flow steering device specification model is composed of the following entities:

Destination (either a TIR/Flow table/vport), where TIR is RSS end-point, vport
is the VF eSwitch port in SRIOV.

Flow table entry (FTE) - the values used by the flow specification
Flow table group (FG) - the masks used by the flow specification
Flow table (FT) - groups several FGs and can serve as destination

The flow steering software entities:

In addition to the device objects, the software have two more objects:

Priorities - group several FTs. Handles order of packet matching.

Namespaces - group several priorities. Namespace are used in order to
isolate different usages of steering (for example, add two separate
namespaces, one for the NIC driver and one for E-Switch FDB).

The base data structure for the flow steering management is a tree and
all the flow steering objects such as (Namespace/Flow table/Flow Group/FTE/etc.)
are represented as a node in the tree, e.g.:
Priority-0 -> FT1 -> FG -> FTE -> TIR (destination)
Priority-1 -> FT2 -> FG->  FTE -> TIR (destination)

Matching begins in FT1 flow rules and if there is a miss on all the FTEs
then matching continues on the FTEs in FT2.

The new implementation solves/improves the following
issues in the current code:

1) The new impl. supports multiple destinations, the search for existing rule with
   the same matching value is performed by the flow steering management.
   In the current impl. the E-switch FDB management code needs to search
   for existing rules before calling to the add rule function.

2) The new impl. manages the flow table level, in the current implementation the
   consumer states the flow table level when new flow table is created without
   any knowledge about the levels of other flow tables.

3) In the current impl. the consumer can't create or destroy flow
   groups dynamically, the flow groups are passed as argument to the create
   flow table API. The new impl. exposes API for create/destroy flow group.

The series is built as follows:

Patch #1 add flow steering API firmware commands.

Patch #2 add tree operation of the flow steering tree: add/remove node,
initialize node and take reference count on a node.

Patch #3 add essential algorithms for managing the flow steering.

Patch #4 Initialize the flow steering tree, flow steering initialization is based
on static tree which illustrates the flow steering tree when the driver is loaded.

Patch #5 is the main patch of the series. It introduce the flow steering API.

Patch torvalds#6 Expose the new flow steering API and remove the old one.
The Ethernet flow steering follows the existing implementation,
but uses the new steering API.

Patch #7 Rename en_flow_table.c to en_fs.c in order to be aligned with
the new flow steering files.
====================

Signed-off-by: David S. Miller <[email protected]>
vineetgarc added a commit that referenced this pull request Dec 14, 2015
This was the second perf intr issue

perf sampling on multicore requires intr to be enabled on all cores.
ARC perf probe code used helper arc_request_percpu_irq() which calls
 - request_percpu_irq() on core0
 - enable_percpu_irq() on all all cores (including core0)

genirq requires that request be made ahead of enable call.
However if perf probe happened on non core0 (observed on a 3.18 kernel),
enable would get called ahead of request, failing obviously and
rendering perf intr disabled on all such cores

[   11.120000] 1 ARC perf       : 8 counters (48 bits), 113 conditions, [overflow IRQ support]
[   11.130000] 1 -----> enable_percpu_irq() IRQ 20 failed
[   11.140000] 3 -----> enable_percpu_irq() IRQ 20 failed
[   11.140000] 2 -----> enable_percpu_irq() IRQ 20 failed
[   11.140000] 0 =====> request_percpu_irq() IRQ 20
[   11.140000] 0 -----> enable_percpu_irq() IRQ 20

Fix this fragility, by calling request_percpu_irq() on whatever core
calls probe (there is no requirement on which core calls this anyways)
and then calling enable on each cores.

Interestingly this started as invesigation of STAR 9000838902:
"sporadically IRQs enabled on perf prob"

which was about occassional boot spew as request_percpu_irq got called
non-locally (from an IPI), and re-enabled interrupts in following path
proc_mkdir ->  spin_unlock_irq()

which the irq work code didn't like.

| ARC perf     : 8 counters (48 bits), 113 conditions, [overflow IRQ support]
|
| BUG: failure at ../kernel/irq_work.c:135/irq_work_run_list()!
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.10-01127-g285efb8e66d1 #2
|
| Stack Trace:
|  arc_unwind_core.constprop.1+0x94/0x104
|  dump_stack+0x62/0x98
|  irq_work_run_list+0xb0/0xb4
|  irq_work_run+0x22/0x3c
|  do_IPI+0x74/0x9c
|  handle_irq_event_percpu+0x34/0x164
|  handle_percpu_irq+0x58/0x78
|  generic_handle_irq+0x1e/0x2c
|  arch_do_IRQ+0x3c/0x60
|  ret_from_exception+0x0/0x8

Cc: Marc Zyngier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Alexey Brodkin <[email protected]>
Cc: <[email protected]> #4.2+
Signed-off-by: Vineet Gupta <[email protected]>
anthony-kolesov pushed a commit that referenced this pull request Dec 14, 2015
When using the Promise TX2+ SATA controller on PA-RISC, the system often
crashes with kernel panic, for example just writing data with the dd
utility will make it crash.

Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources

CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2
Backtrace:
 [<000000004021497c>] show_stack+0x14/0x20
 [<0000000040410bf0>] dump_stack+0x88/0x100
 [<000000004023978c>] panic+0x124/0x360
 [<0000000040452c18>] sba_alloc_range+0x698/0x6a0
 [<0000000040453150>] sba_map_sg+0x260/0x5b8
 [<000000000c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata]
 [<000000000c19535c>] ata_scsi_translate+0xe4/0x220 [libata]
 [<000000000c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata]
 [<0000000040499bbc>] scsi_dispatch_cmd+0xfc/0x130
 [<000000004049da34>] scsi_request_fn+0x6e4/0x970
 [<00000000403e95a8>] __blk_run_queue+0x40/0x60
 [<00000000403e9d8c>] blk_run_queue+0x3c/0x68
 [<000000004049a534>] scsi_run_queue+0x2a4/0x360
 [<000000004049be68>] scsi_end_request+0x1a8/0x238
 [<000000004049de84>] scsi_io_completion+0xfc/0x688
 [<0000000040493c74>] scsi_finish_command+0x17c/0x1d0

The cause of the crash is not exhaustion of the IOMMU space, there is
plenty of free pages. The function sba_alloc_range is called with size
0x11000, thus the pages_needed variable is 0x11. The function
sba_search_bitmap is called with bits_wanted 0x11 and boundary size is
0x10 (because dma_get_seg_boundary(dev) returns 0xffff).

The function sba_search_bitmap attempts to allocate 17 pages that must not
cross 16-page boundary - it can't satisfy this requirement
(iommu_is_span_boundary always returns true) and fails even if there are
many free entries in the IOMMU space.

How did it happen that we try to allocate 17 pages that don't cross
16-page boundary? The cause is in the function iommu_coalesce_chunks. This
function tries to coalesce adjacent entries in the scatterlist. The
function does several checks if it may coalesce one entry with the next,
one of those checks is this:

	if (startsg->length + dma_len > max_seg_size)
		break;

When it finishes coalescing adjacent entries, it allocates the mapping:

sg_dma_len(contig_sg) = dma_len;
dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
sg_dma_address(contig_sg) =
	PIDE_FLAG
	| (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT)
	| dma_offset;

It is possible that (startsg->length + dma_len > max_seg_size) is false
(we are just near the 0x10000 max_seg_size boundary), so the funcion
decides to coalesce this entry with the next entry. When the coalescing
succeeds, the function performs
	dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
And now, because of non-zero dma_offset, dma_len is greater than 0x10000.
iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts
to allocate 17 pages for a device that must not cross 16-page boundary.

To fix the bug, we must make sure that dma_len after addition of
dma_offset and alignment doesn't cross the segment boundary. I.e. change
	if (startsg->length + dma_len > max_seg_size)
		break;
to
	if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size)
		break;

This patch makes this change (it precalculates max_seg_boundary at the
beginning of the function iommu_coalesce_chunks). I also added a check
that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is
not needed for Promise TX2+ SATA, but it may be needed for other devices
that have dma_get_seg_boundary lower than dma_get_max_seg_size).

Signed-off-by: Mikulas Patocka <[email protected]>
Cc: [email protected]
Signed-off-by: Helge Deller <[email protected]>
noamc pushed a commit to Mellanox/linux that referenced this pull request Dec 28, 2015
Drivers like vxlan use the recently introduced
udp_tunnel_xmit_skb/udp_tunnel6_xmit_skb APIs. udp_tunnel6_xmit_skb
makes use of ip6tunnel_xmit, and ip6tunnel_xmit, after sending the
packet, updates the struct stats using the usual
u64_stats_update_begin/end calls on this_cpu_ptr(dev->tstats).
udp_tunnel_xmit_skb makes use of iptunnel_xmit, which doesn't touch
tstats, so drivers like vxlan, immediately after, call
iptunnel_xmit_stats, which does the same thing - calls
u64_stats_update_begin/end on this_cpu_ptr(dev->tstats).

While vxlan is probably fine (I don't know?), calling a similar function
from, say, an unbound workqueue, on a fully preemptable kernel causes
real issues:

[  188.434537] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u8:0/6
[  188.435579] caller is debug_smp_processor_id+0x17/0x20
[  188.435583] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.2.6 foss-for-synopsys-dwc-arc-processors#2
[  188.435607] Call Trace:
[  188.435611]  [<ffffffff8234e936>] dump_stack+0x4f/0x7b
[  188.435615]  [<ffffffff81915f3d>] check_preemption_disabled+0x19d/0x1c0
[  188.435619]  [<ffffffff81915f77>] debug_smp_processor_id+0x17/0x20

The solution would be to protect the whole
this_cpu_ptr(dev->tstats)/u64_stats_update_begin/end blocks with
disabling preemption and then reenabling it.

Signed-off-by: Jason A. Donenfeld <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
noamc pushed a commit to Mellanox/linux that referenced this pull request Dec 28, 2015
We try to convert the old way of of specifying fb tiling (obj->tiling)
into the new fb modifiers. We store the result in the passed in mode_cmd
structure. But that structure comes directly from the addfb2 ioctl, and
gets copied back out to userspace, which means we're clobbering the
modifiers that the user provided (all 0 since the DRM_MODE_FB_MODIFIERS
flag wasn't even set by the user). Hence if the user reuses the struct
for another addfb2, the ioctl will be rejected since it's now asking for
some modifiers w/o the flag set.

Fix the problem by making a copy of the user provided structure. We can
play any games we want with the copy.

IGT-Version: 1.12-git (x86_64) (Linux: 4.4.0-rc1-stereo+ x86_64)
...
Subtest basic-X-tiled: SUCCESS (0.001s)
Test assertion failure function pitch_tests, file kms_addfb_basic.c:167:
Failed assertion: drmIoctl(fd, DRM_IOCTL_MODE_ADDFB2, &f) == 0
Last errno: 22, Invalid argument
Stack trace:
  #0 [__igt_fail_assert+0x101]
  foss-for-synopsys-dwc-arc-processors#1 [pitch_tests+0x619]
  foss-for-synopsys-dwc-arc-processors#2 [__real_main426+0x2f]
  foss-for-synopsys-dwc-arc-processors#3 [main+0x23]
  foss-for-synopsys-dwc-arc-processors#4 [__libc_start_main+0xf0]
  foss-for-synopsys-dwc-arc-processors#5 [_start+0x29]
  torvalds#6 [<unknown>+0x29]
  Subtest framebuffer-vs-set-tiling failed.
  **** DEBUG ****
  Test assertion failure function pitch_tests, file kms_addfb_basic.c:167:
  Failed assertion: drmIoctl(fd, DRM_IOCTL_MODE_ADDFB2, &f) == 0
  Last errno: 22, Invalid argument
  ****  END  ****
  Subtest framebuffer-vs-set-tiling: FAIL (0.003s)
  ...

IGT-Version: 1.12-git (x86_64) (Linux: 4.4.0-rc1-stereo+ x86_64)
Subtest framebuffer-vs-set-tiling: SUCCESS (0.000s)

Cc: [email protected] # v4.1+
Cc: Daniel Vetter <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Fixes: 2a80ead ("drm/i915: Add fb format modifier support")
Testcase: igt/kms_addfb_basic/clobbered-modifier
Signed-off-by: Ville Syrjälä <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
noamc pushed a commit to Mellanox/linux that referenced this pull request Dec 28, 2015
The current code tries to allocate memory with GFP_KERNEL at
interrupt context, it would show below warning during the enumeration
when I test it with chipidea hardware, change GFP flag as GFP_ATOMIC
can fix this issue.

[   40.438237] zero gadget: high-speed config foss-for-synopsys-dwc-arc-processors#2: loopback
[   40.444924] ------------[ cut here ]------------
[   40.449609] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:2755 lockdep_trace_alloc+0x108/0x128()
[   40.461715] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
[   40.467130] Modules linked in:
[   40.470216]  usb_f_ss_lb g_zero libcomposite evbug
[   40.473822] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.3.0-rc5-00168-gb730aaf torvalds#604
[   40.481496] Hardware name: Freescale i.MX6 SoloX (Device Tree)
[   40.487345] Backtrace:
[   40.489857] [<80014e94>] (dump_backtrace) from [<80015088>] (show_stack+0x18/0x1c)
[   40.497445]  r6:80b67a80 r5:00000000 r4:00000000 r3:00000000
[   40.503234] [<80015070>] (show_stack) from [<802e27b4>] (dump_stack+0x8c/0xa4)
[   40.510503] [<802e2728>] (dump_stack) from [<8002cfe8>] (warn_slowpath_common+0x80/0xbc)
[   40.518612]  r6:8007510c r5:00000009 r4:80b49c88 r3:00000001
[   40.524396] [<8002cf68>] (warn_slowpath_common) from [<8002d05c>] (warn_slowpath_fmt+0x38/0x40)
[   40.533109]  r8:bcfdef80 r7:bdb705cc r6:000080d0 r5:be001e80 r4:809cc278
[   40.539965] [<8002d028>] (warn_slowpath_fmt) from [<8007510c>] (lockdep_trace_alloc+0x108/0x128)
[   40.548766]  r3:809d0128 r2:809cc278
[   40.552401]  r4:600b0193
[   40.554990] [<80075004>] (lockdep_trace_alloc) from [<801093d4>] (kmem_cache_alloc+0x28/0x15c)
[   40.563618]  r4:000080d0 r3:80b4aa8c
[   40.567270] [<801093ac>] (kmem_cache_alloc) from [<804d95e4>] (ep_alloc_request+0x58/0x68)
[   40.575550]  r10:7f01f104 r9:00000001 r8:bcfdef80 r7:bdb705cc r6:bc178700 r5:00000000
[   40.583512]  r4:bcfdef80 r3:813c0a38
[   40.587183] [<804d958c>] (ep_alloc_request) from [<7f01f7ec>] (loopback_set_alt+0x114/0x21c [usb_f_ss_lb])
[   40.596929] [<7f01f6d8>] (loopback_set_alt [usb_f_ss_lb]) from [<7f006910>] (composite_setup+0xbd0/0x17e8 [libcomposite])
[   40.607902]  r10:bd3a2c0c r9:00000000 r8:bcfdef80 r7:bc178700 r6:bdb702d0 r5:bcfdefdc
[   40.615866]  r4:7f0199b4 r3:00000002
[   40.619542] [<7f005d40>] (composite_setup [libcomposite]) from [<804dae88>] (udc_irq+0x784/0xd1c)
[   40.628431]  r10:80bb5619 r9:c0876140 r8:00012001 r7:bdb71010 r6:bdb70568 r5:00010001
[   40.636392]  r4:bdb70014
[   40.638985] [<804da704>] (udc_irq) from [<804d64f8>] (ci_irq+0x5c/0x118)
[   40.645702]  r10:80bb5619 r9:be11e000 r8:00000117 r7:00000000 r6:bdb71010 r5:be11e060
[   40.653666]  r4:bdb70010
[   40.656261] [<804d649c>] (ci_irq) from [<8007f638>] (handle_irq_event_percpu+0x7c/0x13c)
[   40.664367]  r6:00000000 r5:be11e060 r4:bdb05cc0 r3:804d649c
[   40.670149] [<8007f5bc>] (handle_irq_event_percpu) from [<8007f740>] (handle_irq_event+0x48/0x6c)
[   40.679036]  r10:00000000 r9:be008000 r8:00000001 r7:00000000 r6:bdb05cc0 r5:be11e060
[   40.686998]  r4:be11e000
[   40.689581] [<8007f6f8>] (handle_irq_event) from [<80082850>] (handle_fasteoi_irq+0xd4/0x1b0)
[   40.698120]  r6:80b56a30 r5:be11e060 r4:be11e000 r3:00000000
[   40.703898] [<8008277c>] (handle_fasteoi_irq) from [<8007ec04>] (generic_handle_irq+0x28/0x3c)
[   40.712524]  r7:00000000 r6:80b4aaf4 r5:00000117 r4:80b445fc
[   40.718304] [<8007ebdc>] (generic_handle_irq) from [<8007ef20>] (__handle_domain_irq+0x6c/0xe8)
[   40.727033] [<8007eeb4>] (__handle_domain_irq) from [<800095d4>] (gic_handle_irq+0x48/0x94)
[   40.735402]  r9:c080f100 r8:80b4ac6c r7:c080e100 r6:80b67d40 r5:80b49f00 r4:c080e10c
[   40.743290] [<8000958c>] (gic_handle_irq) from [<80015d38>] (__irq_svc+0x58/0x78)
[   40.750791] Exception stack(0x80b49f00 to 0x80b49f48)
[   40.755873] 9f00: 00000001 00000001 00000000 80024320 80b48000 80b4a9d0 80b4a984 80b433e4
[   40.764078] 9f20: 00000001 807f4680 00000000 80b49f5c 80b49f20 80b49f50 80071ca4 800113fc
[   40.772272] 9f40: 200b0013 ffffffff
[   40.775776]  r9:807f4680 r8:00000001 r7:80b49f34 r6:ffffffff r5:200b0013 r4:800113fc
[   40.783677] [<800113d4>] (arch_cpu_idle) from [<8006c5bc>] (default_idle_call+0x28/0x38)
[   40.791798] [<8006c594>] (default_idle_call) from [<8006c6dc>] (cpu_startup_entry+0x110/0x1b0)
[   40.800445] [<8006c5cc>] (cpu_startup_entry) from [<807e95dc>] (rest_init+0x12c/0x168)
[   40.808376]  r7:80b4a8c0 r3:807f4b7c
[   40.812030] [<807e94b0>] (rest_init) from [<80ad7cc0>] (start_kernel+0x360/0x3d4)
[   40.819528]  r5:80bcb000 r4:80bcb050
[   40.823171] [<80ad7960>] (start_kernel) from [<8000807c>] (0x8000807c)

It fixes commit 91c42b0 ("usb: gadget: loopback: Fix looping back
logic implementation").

Cc: <[email protected]> # v3.18+
Signed-off-by: Peter Chen <[email protected]>
Reviewed-by: Krzysztof Opasiak <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
noamc pushed a commit to Mellanox/linux that referenced this pull request Dec 28, 2015
When a passive TCP is created, we eventually call tcp_md5_do_add()
with sk pointing to the child. It is not owner by the user yet (we
will add this socket into listener accept queue a bit later anyway)

But we do own the spinlock, so amend the lockdep annotation to avoid
following splat :

[ 8451.090932] net/ipv4/tcp_ipv4.c:923 suspicious rcu_dereference_protected() usage!
[ 8451.090932]
[ 8451.090932] other info that might help us debug this:
[ 8451.090932]
[ 8451.090934]
[ 8451.090934] rcu_scheduler_active = 1, debug_locks = 1
[ 8451.090936] 3 locks held by socket_sockopt_/214795:
[ 8451.090936]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff855c6ac1>] __netif_receive_skb_core+0x151/0xe90
[ 8451.090947]  foss-for-synopsys-dwc-arc-processors#1:  (rcu_read_lock){.+.+..}, at: [<ffffffff85618143>] ip_local_deliver_finish+0x43/0x2b0
[ 8451.090952]  foss-for-synopsys-dwc-arc-processors#2:  (slock-AF_INET){+.-...}, at: [<ffffffff855acda5>] sk_clone_lock+0x1c5/0x500
[ 8451.090958]
[ 8451.090958] stack backtrace:
[ 8451.090960] CPU: 7 PID: 214795 Comm: socket_sockopt_

[ 8451.091215] Call Trace:
[ 8451.091216]  <IRQ>  [<ffffffff856fb29c>] dump_stack+0x55/0x76
[ 8451.091229]  [<ffffffff85123b5b>] lockdep_rcu_suspicious+0xeb/0x110
[ 8451.091235]  [<ffffffff8564544f>] tcp_md5_do_add+0x1bf/0x1e0
[ 8451.091239]  [<ffffffff85645751>] tcp_v4_syn_recv_sock+0x1f1/0x4c0
[ 8451.091242]  [<ffffffff85642b27>] ? tcp_v4_md5_hash_skb+0x167/0x190
[ 8451.091246]  [<ffffffff85647c78>] tcp_check_req+0x3c8/0x500
[ 8451.091249]  [<ffffffff856451ae>] ? tcp_v4_inbound_md5_hash+0x11e/0x190
[ 8451.091253]  [<ffffffff85647170>] tcp_v4_rcv+0x3c0/0x9f0
[ 8451.091256]  [<ffffffff85618143>] ? ip_local_deliver_finish+0x43/0x2b0
[ 8451.091260]  [<ffffffff856181b6>] ip_local_deliver_finish+0xb6/0x2b0
[ 8451.091263]  [<ffffffff85618143>] ? ip_local_deliver_finish+0x43/0x2b0
[ 8451.091267]  [<ffffffff85618d38>] ip_local_deliver+0x48/0x80
[ 8451.091270]  [<ffffffff85618510>] ip_rcv_finish+0x160/0x700
[ 8451.091273]  [<ffffffff8561900e>] ip_rcv+0x29e/0x3d0
[ 8451.091277]  [<ffffffff855c74b7>] __netif_receive_skb_core+0xb47/0xe90

Fixes: a8afca0 ("tcp: md5: protects md5sig_info with RCU")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
noamc pushed a commit to Mellanox/linux that referenced this pull request Dec 28, 2015
Liu reported that running certain parts of xfstests threw the
following error:

BUG: sleeping function called from invalid context at mm/page_alloc.c:3190
in_atomic(): 1, irqs_disabled(): 0, pid: 6, name: kworker/u16:0
3 locks held by kworker/u16:0/6:
 #0:  ("writeback"){++++.+}, at: [<ffffffff8107f083>] process_one_work+0x173/0x730
 foss-for-synopsys-dwc-arc-processors#1:  ((&(&wb->dwork)->work)){+.+.+.}, at: [<ffffffff8107f083>] process_one_work+0x173/0x730
 foss-for-synopsys-dwc-arc-processors#2:  (&type->s_umount_key#44){+++++.}, at: [<ffffffff811e6805>] trylock_super+0x25/0x60
CPU: 5 PID: 6 Comm: kworker/u16:0 Tainted: G           OE   4.3.0+ foss-for-synopsys-dwc-arc-processors#3
Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
Workqueue: writeback wb_workfn (flush-btrfs-108)
 ffffffff81a3abab ffff88042e282ba8 ffffffff8130191b ffffffff81a3abab
 0000000000000c76 ffff88042e282ba8 ffff88042e27c180 ffff88042e282bd8
 ffffffff8108ed95 ffff880400000004 0000000000000000 0000000000000c76
Call Trace:
 [<ffffffff8130191b>] dump_stack+0x4f/0x74
 [<ffffffff8108ed95>] ___might_sleep+0x185/0x240
 [<ffffffff8108eea2>] __might_sleep+0x52/0x90
 [<ffffffff811817e8>] __alloc_pages_nodemask+0x268/0x410
 [<ffffffff8109a43c>] ? sched_clock_local+0x1c/0x90
 [<ffffffff8109a6d1>] ? local_clock+0x21/0x40
 [<ffffffff810b9eb0>] ? __lock_release+0x420/0x510
 [<ffffffff810b534c>] ? __lock_acquired+0x16c/0x3c0
 [<ffffffff811ca265>] alloc_pages_current+0xc5/0x210
 [<ffffffffa0577105>] ? rbio_is_full+0x55/0x70 [btrfs]
 [<ffffffff810b7ed8>] ? mark_held_locks+0x78/0xa0
 [<ffffffff81666d50>] ? _raw_spin_unlock_irqrestore+0x40/0x60
 [<ffffffffa0578c0a>] full_stripe_write+0x5a/0xc0 [btrfs]
 [<ffffffffa0578ca9>] __raid56_parity_write+0x39/0x60 [btrfs]
 [<ffffffffa0578deb>] run_plug+0x11b/0x140 [btrfs]
 [<ffffffffa0578e33>] btrfs_raid_unplug+0x23/0x70 [btrfs]
 [<ffffffff812d36c2>] blk_flush_plug_list+0x82/0x1f0
 [<ffffffff812e0349>] blk_sq_make_request+0x1f9/0x740
 [<ffffffff812ceba2>] ? generic_make_request_checks+0x222/0x7c0
 [<ffffffff812cf264>] ? blk_queue_enter+0x124/0x310
 [<ffffffff812cf1d2>] ? blk_queue_enter+0x92/0x310
 [<ffffffff812d0ae2>] generic_make_request+0x172/0x2c0
 [<ffffffff812d0ad4>] ? generic_make_request+0x164/0x2c0
 [<ffffffff812d0ca0>] submit_bio+0x70/0x140
 [<ffffffffa0577b29>] ? rbio_add_io_page+0x99/0x150 [btrfs]
 [<ffffffffa0578a89>] finish_rmw+0x4d9/0x600 [btrfs]
 [<ffffffffa0578c4c>] full_stripe_write+0x9c/0xc0 [btrfs]
 [<ffffffffa057ab7f>] raid56_parity_write+0xef/0x160 [btrfs]
 [<ffffffffa052bd83>] btrfs_map_bio+0xe3/0x2d0 [btrfs]
 [<ffffffffa04fbd6d>] btrfs_submit_bio_hook+0x8d/0x1d0 [btrfs]
 [<ffffffffa05173c4>] submit_one_bio+0x74/0xb0 [btrfs]
 [<ffffffffa0517f55>] submit_extent_page+0xe5/0x1c0 [btrfs]
 [<ffffffffa0519b18>] __extent_writepage_io+0x408/0x4c0 [btrfs]
 [<ffffffffa05179c0>] ? alloc_dummy_extent_buffer+0x140/0x140 [btrfs]
 [<ffffffffa051dc88>] __extent_writepage+0x218/0x3a0 [btrfs]
 [<ffffffff810b7ed8>] ? mark_held_locks+0x78/0xa0
 [<ffffffffa051e2c9>] extent_write_cache_pages.clone.0+0x2f9/0x400 [btrfs]
 [<ffffffffa051e422>] extent_writepages+0x52/0x70 [btrfs]
 [<ffffffffa05001f0>] ? btrfs_set_inode_index+0x70/0x70 [btrfs]
 [<ffffffffa04fcc17>] btrfs_writepages+0x27/0x30 [btrfs]
 [<ffffffff81184df3>] do_writepages+0x23/0x40
 [<ffffffff81212229>] __writeback_single_inode+0x89/0x4d0
 [<ffffffff81212a60>] ? writeback_sb_inodes+0x260/0x480
 [<ffffffff81212a60>] ? writeback_sb_inodes+0x260/0x480
 [<ffffffff8121295f>] ? writeback_sb_inodes+0x15f/0x480
 [<ffffffff81212ad2>] writeback_sb_inodes+0x2d2/0x480
 [<ffffffff810b1397>] ? down_read_trylock+0x57/0x60
 [<ffffffff811e6805>] ? trylock_super+0x25/0x60
 [<ffffffff810d629f>] ? rcu_read_lock_sched_held+0x4f/0x90
 [<ffffffff81212d0c>] __writeback_inodes_wb+0x8c/0xc0
 [<ffffffff812130b5>] wb_writeback+0x2b5/0x500
 [<ffffffff810b7ed8>] ? mark_held_locks+0x78/0xa0
 [<ffffffff810660a8>] ? __local_bh_enable_ip+0x68/0xc0
 [<ffffffff81213362>] ? wb_do_writeback+0x62/0x310
 [<ffffffff812133c1>] wb_do_writeback+0xc1/0x310
 [<ffffffff8107c3d9>] ? set_worker_desc+0x79/0x90
 [<ffffffff81213842>] wb_workfn+0x92/0x330
 [<ffffffff8107f133>] process_one_work+0x223/0x730
 [<ffffffff8107f083>] ? process_one_work+0x173/0x730
 [<ffffffff8108035f>] ? worker_thread+0x18f/0x430
 [<ffffffff810802ed>] worker_thread+0x11d/0x430
 [<ffffffff810801d0>] ? maybe_create_worker+0xf0/0xf0
 [<ffffffff810801d0>] ? maybe_create_worker+0xf0/0xf0
 [<ffffffff810858df>] kthread+0xef/0x110
 [<ffffffff8108f74e>] ? schedule_tail+0x1e/0xd0
 [<ffffffff810857f0>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff816673bf>] ret_from_fork+0x3f/0x70
 [<ffffffff810857f0>] ? __init_kthread_worker+0x70/0x70

The issue is that we've got the software context pinned while
calling blk_flush_plug_list(), which flushes callbacks that
are allowed to sleep. btrfs and raid has such callbacks.

Flip the checks around a bit, so we can enable preempt a bit
earlier and flush plugs without having preempt disabled.

This only affects blk-mq driven devices, and only those that
register a single queue.

Reported-by: Liu Bo <[email protected]>
Tested-by: Liu Bo <[email protected]>
Cc: [email protected]
Signed-off-by: Jens Axboe <[email protected]>
shahab-vahedi pushed a commit that referenced this pull request Nov 23, 2022
We've got a mess on our hands.

1. xfs_trans_commit() cannot cancel transactions because the mount is
shut down - that causes dirty, aborted, unlogged log items to sit
unpinned in memory and potentially get written to disk before the
log is shut down. Hence xfs_trans_commit() can only abort
transactions when xlog_is_shutdown() is true.

2. xfs_force_shutdown() is used in places to cause the current
modification to be aborted via xfs_trans_commit() because it may be
impractical or impossible to cancel the transaction directly, and
hence xfs_trans_commit() must cancel transactions when
xfs_is_shutdown() is true in this situation. But we can't do that
because of #1.

3. Log IO errors cause log shutdowns by calling xfs_force_shutdown()
to shut down the mount and then the log from log IO completion.

4. xfs_force_shutdown() can result in a log force being issued,
which has to wait for log IO completion before it will mark the log
as shut down. If #3 races with some other shutdown trigger that runs
a log force, we rely on xfs_force_shutdown() silently ignoring #3
and avoiding shutting down the log until the failed log force
completes.

5. To ensure #2 always works, we have to ensure that
xfs_force_shutdown() does not return until the the log is shut down.
But in the case of #4, this will result in a deadlock because the
log Io completion will block waiting for a log force to complete
which is blocked waiting for log IO to complete....

So the very first thing we have to do here to untangle this mess is
dissociate log shutdown triggers from mount shutdowns. We already
have xlog_forced_shutdown, which will atomically transistion to the
log a shutdown state. Due to internal asserts it cannot be called
multiple times, but was done simply because the only place that
could call it was xfs_do_force_shutdown() (i.e. the mount shutdown!)
and that could only call it once and once only.  So the first thing
we do is remove the asserts.

We then convert all the internal log shutdown triggers to call
xlog_force_shutdown() directly instead of xfs_force_shutdown(). This
allows the log shutdown triggers to shut down the log without
needing to care about mount based shutdown constraints. This means
we shut down the log independently of the mount and the mount may
not notice this until it's next attempt to read or modify metadata.
At that point (e.g. xfs_trans_commit()) it will see that the log is
shutdown, error out and shutdown the mount.

To ensure that all the unmount behaviours and asserts track
correctly as a result of a log shutdown, propagate the shutdown up
to the mount if it is not already set. This keeps the mount and log
state in sync, and saves a huge amount of hassle where code fails
because of a log shutdown but only checks for mount shutdowns and
hence ends up doing the wrong thing. Cleaning up that mess is
an exercise for another day.

This enables us to address the other problems noted above in
followup patches.

Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
shahab-vahedi pushed a commit that referenced this pull request Nov 23, 2022
When calling smb2_ioctl_query_info() with invalid
smb_query_info::flags, a NULL ptr dereference is triggered when trying
to kfree() uninitialised rqst[n].rq_iov array.

This also fixes leaked paths that are created in SMB2_open_init()
which required SMB2_open_free() to properly free them.

Here is a small C reproducer that triggers it

	#include <stdio.h>
	#include <stdlib.h>
	#include <stdint.h>
	#include <unistd.h>
	#include <fcntl.h>
	#include <sys/ioctl.h>

	#define die(s) perror(s), exit(1)
	#define QUERY_INFO 0xc018cf07

	int main(int argc, char *argv[])
	{
		int fd;

		if (argc < 2)
			exit(1);
		fd = open(argv[1], O_RDONLY);
		if (fd == -1)
			die("open");
		if (ioctl(fd, QUERY_INFO, (uint32_t[]) { 0, 0, 0, 4, 0, 0}) == -1)
			die("ioctl");
		close(fd);
		return 0;
	}

	mount.cifs //srv/share /mnt -o ...
	gcc repro.c && ./a.out /mnt/f0

	[ 1832.124468] CIFS: VFS: \\w22-dc.zelda.test\test Invalid passthru query flags: 0x4
	[ 1832.125043] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI
	[ 1832.125764] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
	[ 1832.126241] CPU: 3 PID: 1133 Comm: a.out Not tainted 5.17.0-rc8 #2
	[ 1832.126630] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
	[ 1832.127322] RIP: 0010:smb2_ioctl_query_info+0x7a3/0xe30 [cifs]
	[ 1832.127749] Code: 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 6c 05 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 74 24 28 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 cb 04 00 00 49 8b 3e e8 bb fc fa ff 48 89 da 48
	[ 1832.128911] RSP: 0018:ffffc90000957b08 EFLAGS: 00010256
	[ 1832.129243] RAX: dffffc0000000000 RBX: ffff888117e9b850 RCX: ffffffffa020580d
	[ 1832.129691] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffa043a2c0
	[ 1832.130137] RBP: ffff888117e9b878 R08: 0000000000000001 R09: 0000000000000003
	[ 1832.130585] R10: fffffbfff4087458 R11: 0000000000000001 R12: ffff888117e9b800
	[ 1832.131037] R13: 00000000ffffffea R14: 0000000000000000 R15: ffff888117e9b8a8
	[ 1832.131485] FS:  00007fcee9900740(0000) GS:ffff888151a00000(0000) knlGS:0000000000000000
	[ 1832.131993] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[ 1832.132354] CR2: 00007fcee9a1ef5e CR3: 0000000114cd2000 CR4: 0000000000350ee0
	[ 1832.132801] Call Trace:
	[ 1832.132962]  <TASK>
	[ 1832.133104]  ? smb2_query_reparse_tag+0x890/0x890 [cifs]
	[ 1832.133489]  ? cifs_mapchar+0x460/0x460 [cifs]
	[ 1832.133822]  ? rcu_read_lock_sched_held+0x3f/0x70
	[ 1832.134125]  ? cifs_strndup_to_utf16+0x15b/0x250 [cifs]
	[ 1832.134502]  ? lock_downgrade+0x6f0/0x6f0
	[ 1832.134760]  ? cifs_convert_path_to_utf16+0x198/0x220 [cifs]
	[ 1832.135170]  ? smb2_check_message+0x1080/0x1080 [cifs]
	[ 1832.135545]  cifs_ioctl+0x1577/0x3320 [cifs]
	[ 1832.135864]  ? lock_downgrade+0x6f0/0x6f0
	[ 1832.136125]  ? cifs_readdir+0x2e60/0x2e60 [cifs]
	[ 1832.136468]  ? rcu_read_lock_sched_held+0x3f/0x70
	[ 1832.136769]  ? __rseq_handle_notify_resume+0x80b/0xbe0
	[ 1832.137096]  ? __up_read+0x192/0x710
	[ 1832.137327]  ? __ia32_sys_rseq+0xf0/0xf0
	[ 1832.137578]  ? __x64_sys_openat+0x11f/0x1d0
	[ 1832.137850]  __x64_sys_ioctl+0x127/0x190
	[ 1832.138103]  do_syscall_64+0x3b/0x90
	[ 1832.138378]  entry_SYSCALL_64_after_hwframe+0x44/0xae
	[ 1832.138702] RIP: 0033:0x7fcee9a253df
	[ 1832.138937] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
	[ 1832.140107] RSP: 002b:00007ffeba94a8a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[ 1832.140606] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcee9a253df
	[ 1832.141058] RDX: 00007ffeba94a910 RSI: 00000000c018cf07 RDI: 0000000000000003
	[ 1832.141503] RBP: 00007ffeba94a930 R08: 00007fcee9b24db0 R09: 00007fcee9b45c4e
	[ 1832.141948] R10: 00007fcee9918d40 R11: 0000000000000246 R12: 00007ffeba94aa48
	[ 1832.142396] R13: 0000000000401176 R14: 0000000000403df8 R15: 00007fcee9b78000
	[ 1832.142851]  </TASK>
	[ 1832.142994] Modules linked in: cifs cifs_arc4 cifs_md4 bpf_preload [last unloaded: cifs]

Cc: [email protected]
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>
shahab-vahedi pushed a commit that referenced this pull request Nov 23, 2022
As guest_irq is coming from KVM_IRQFD API call, it may trigger
crash in svm_update_pi_irte() due to out-of-bounds:

crash> bt
PID: 22218  TASK: ffff951a6ad74980  CPU: 73  COMMAND: "vcpu8"
 #0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397
 #1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d
 #2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d
 #3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d
 #4 [ffffb1ba6707fb90] no_context at ffffffff856692c9
 #5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51
 torvalds#6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace
    [exception RIP: svm_update_pi_irte+227]
    RIP: ffffffffc0761b53  RSP: ffffb1ba6707fd08  RFLAGS: 00010086
    RAX: ffffb1ba6707fd78  RBX: ffffb1ba66d91000  RCX: 0000000000000001
    RDX: 00003c803f63f1c0  RSI: 000000000000019a  RDI: ffffb1ba66db2ab8
    RBP: 000000000000019a   R8: 0000000000000040   R9: ffff94ca41b82200
    R10: ffffffffffffffcf  R11: 0000000000000001  R12: 0000000000000001
    R13: 0000000000000001  R14: ffffffffffffffcf  R15: 000000000000005f
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm]
 #8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm]
 #9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm]
    RIP: 00007f143c36488b  RSP: 00007f143a4e04b8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007f05780041d0  RCX: 00007f143c36488b
    RDX: 00007f05780041d0  RSI: 000000004008ae6a  RDI: 0000000000000020
    RBP: 00000000000004e8   R8: 0000000000000008   R9: 00007f05780041e0
    R10: 00007f0578004560  R11: 0000000000000246  R12: 00000000000004e0
    R13: 000000000000001a  R14: 00007f1424001c60  R15: 00007f0578003bc0
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

Vmx have been fix this in commit 3a8b067 (KVM: VMX: Do not BUG() on
out-of-bounds guest IRQ), so we can just copy source from that to fix
this.

Co-developed-by: Yi Liu <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Signed-off-by: Yi Wang <[email protected]>
Message-Id: <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
shahab-vahedi pushed a commit that referenced this pull request Nov 23, 2022
There is possible circular locking dependency detected on event_mutex
(see below logs). This is due to set fail safe mode is done at
dp_panel_read_sink_caps() within event_mutex scope. To break this
possible circular locking, this patch move setting fail safe mode
out of event_mutex scope.

[   23.958078] ======================================================
[   23.964430] WARNING: possible circular locking dependency detected
[   23.970777] 5.17.0-rc2-lockdep-00088-g05241de1f69e #148 Not tainted
[   23.977219] ------------------------------------------------------
[   23.983570] DrmThread/1574 is trying to acquire lock:
[   23.988763] ffffff808423aab0 (&dp->event_mutex){+.+.}-{3:3}, at: msm_dp_displ                                                                             ay_enable+0x58/0x164
[   23.997895]
[   23.997895] but task is already holding lock:
[   24.003895] ffffff808420b280 (&kms->commit_lock[i]/1){+.+.}-{3:3}, at: lock_c                                                                             rtcs+0x80/0x8c
[   24.012495]
[   24.012495] which lock already depends on the new lock.
[   24.012495]
[   24.020886]
[   24.020886] the existing dependency chain (in reverse order) is:
[   24.028570]
[   24.028570] -> #5 (&kms->commit_lock[i]/1){+.+.}-{3:3}:
[   24.035472]        __mutex_lock+0xc8/0x384
[   24.039695]        mutex_lock_nested+0x54/0x74
[   24.044272]        lock_crtcs+0x80/0x8c
[   24.048222]        msm_atomic_commit_tail+0x1e8/0x3d0
[   24.053413]        commit_tail+0x7c/0xfc
[   24.057452]        drm_atomic_helper_commit+0x158/0x15c
[   24.062826]        drm_atomic_commit+0x60/0x74
[   24.067403]        drm_mode_atomic_ioctl+0x6b0/0x908
[   24.072508]        drm_ioctl_kernel+0xe8/0x168
[   24.077086]        drm_ioctl+0x320/0x370
[   24.081123]        drm_compat_ioctl+0x40/0xdc
[   24.085602]        __arm64_compat_sys_ioctl+0xe0/0x150
[   24.090895]        invoke_syscall+0x80/0x114
[   24.095294]        el0_svc_common.constprop.3+0xc4/0xf8
[   24.100668]        do_el0_svc_compat+0x2c/0x54
[   24.105242]        el0_svc_compat+0x4c/0xe4
[   24.109548]        el0t_32_sync_handler+0xc4/0xf4
[   24.114381]        el0t_32_sync+0x178
[   24.118688]
[   24.118688] -> #4 (&kms->commit_lock[i]){+.+.}-{3:3}:
[   24.125408]        __mutex_lock+0xc8/0x384
[   24.129628]        mutex_lock_nested+0x54/0x74
[   24.134204]        lock_crtcs+0x80/0x8c
[   24.138155]        msm_atomic_commit_tail+0x1e8/0x3d0
[   24.143345]        commit_tail+0x7c/0xfc
[   24.147382]        drm_atomic_helper_commit+0x158/0x15c
[   24.152755]        drm_atomic_commit+0x60/0x74
[   24.157323]        drm_atomic_helper_set_config+0x68/0x90
[   24.162869]        drm_mode_setcrtc+0x394/0x648
[   24.167535]        drm_ioctl_kernel+0xe8/0x168
[   24.172102]        drm_ioctl+0x320/0x370
[   24.176135]        drm_compat_ioctl+0x40/0xdc
[   24.180621]        __arm64_compat_sys_ioctl+0xe0/0x150
[   24.185904]        invoke_syscall+0x80/0x114
[   24.190302]        el0_svc_common.constprop.3+0xc4/0xf8
[   24.195673]        do_el0_svc_compat+0x2c/0x54
[   24.200241]        el0_svc_compat+0x4c/0xe4
[   24.204544]        el0t_32_sync_handler+0xc4/0xf4
[   24.209378]        el0t_32_sync+0x174/0x178
[   24.213680] -> #3 (crtc_ww_class_mutex){+.+.}-{3:3}:
[   24.220308]        __ww_mutex_lock.constprop.20+0xe8/0x878
[   24.225951]        ww_mutex_lock+0x60/0xd0
[   24.230166]        modeset_lock+0x190/0x19c
[   24.234467]        drm_modeset_lock+0x34/0x54
[   24.238953]        drmm_mode_config_init+0x550/0x764
[   24.244065]        msm_drm_bind+0x170/0x59c
[   24.248374]        try_to_bring_up_master+0x244/0x294
[   24.253572]        __component_add+0xf4/0x14c
[   24.258057]        component_add+0x2c/0x38
[   24.262273]        dsi_dev_attach+0x2c/0x38
[   24.266575]        dsi_host_attach+0xc4/0x120
[   24.271060]        mipi_dsi_attach+0x34/0x48
[   24.275456]        devm_mipi_dsi_attach+0x28/0x68
[   24.280298]        ti_sn_bridge_probe+0x2b4/0x2dc
[   24.285137]        auxiliary_bus_probe+0x78/0x90
[   24.289893]        really_probe+0x1e4/0x3d8
[   24.294194]        __driver_probe_device+0x14c/0x164
[   24.299298]        driver_probe_device+0x54/0xf8
[   24.304043]        __device_attach_driver+0xb4/0x118
[   24.309145]        bus_for_each_drv+0xb0/0xd4
[   24.313628]        __device_attach+0xcc/0x158
[   24.318112]        device_initial_probe+0x24/0x30
[   24.322954]        bus_probe_device+0x38/0x9c
[   24.327439]        deferred_probe_work_func+0xd4/0xf0
[   24.332628]        process_one_work+0x2f0/0x498
[   24.337289]        process_scheduled_works+0x44/0x48
[   24.342391]        worker_thread+0x1e4/0x26c
[   24.346788]        kthread+0xe4/0xf4
[   24.350470]        ret_from_fork+0x10/0x20
[   24.354683]
[   24.354683]
[   24.354683] -> #2 (crtc_ww_class_acquire){+.+.}-{0:0}:
[   24.361489]        drm_modeset_acquire_init+0xe4/0x138
[   24.366777]        drm_helper_probe_detect_ctx+0x44/0x114
[   24.372327]        check_connector_changed+0xbc/0x198
[   24.377517]        drm_helper_hpd_irq_event+0xcc/0x11c
[   24.382804]        dsi_hpd_worker+0x24/0x30
[   24.387104]        process_one_work+0x2f0/0x498
[   24.391762]        worker_thread+0x1d0/0x26c
[   24.396158]        kthread+0xe4/0xf4
[   24.399840]        ret_from_fork+0x10/0x20
[   24.404053]
[   24.404053] -> #1 (&dev->mode_config.mutex){+.+.}-{3:3}:
[   24.411032]        __mutex_lock+0xc8/0x384
[   24.415247]        mutex_lock_nested+0x54/0x74
[   24.419819]        dp_panel_read_sink_caps+0x23c/0x26c
[   24.425108]        dp_display_process_hpd_high+0x34/0xd4
[   24.430570]        dp_display_usbpd_configure_cb+0x30/0x3c
[   24.436205]        hpd_event_thread+0x2ac/0x550
[   24.440864]        kthread+0xe4/0xf4
[   24.444544]        ret_from_fork+0x10/0x20
[   24.448757]
[   24.448757] -> #0 (&dp->event_mutex){+.+.}-{3:3}:
[   24.455116]        __lock_acquire+0xe2c/0x10d8
[   24.459690]        lock_acquire+0x1ac/0x2d0
[   24.463988]        __mutex_lock+0xc8/0x384
[   24.468201]        mutex_lock_nested+0x54/0x74
[   24.472773]        msm_dp_display_enable+0x58/0x164
[   24.477789]        dp_bridge_enable+0x24/0x30
[   24.482273]        drm_atomic_bridge_chain_enable+0x78/0x9c
[   24.488006]        drm_atomic_helper_commit_modeset_enables+0x1bc/0x244
[   24.494801]        msm_atomic_commit_tail+0x248/0x3d0
[   24.499992]        commit_tail+0x7c/0xfc
[   24.504031]        drm_atomic_helper_commit+0x158/0x15c
[   24.509404]        drm_atomic_commit+0x60/0x74
[   24.513976]        drm_mode_atomic_ioctl+0x6b0/0x908
[   24.519079]        drm_ioctl_kernel+0xe8/0x168
[   24.523650]        drm_ioctl+0x320/0x370
[   24.527689]        drm_compat_ioctl+0x40/0xdc
[   24.532175]        __arm64_compat_sys_ioctl+0xe0/0x150
[   24.537463]        invoke_syscall+0x80/0x114
[   24.541861]        el0_svc_common.constprop.3+0xc4/0xf8
[   24.547235]        do_el0_svc_compat+0x2c/0x54
[   24.551806]        el0_svc_compat+0x4c/0xe4
[   24.556106]        el0t_32_sync_handler+0xc4/0xf4
[   24.560948]        el0t_32_sync+0x174/0x178

Changes in v2:
-- add circular lockiing trace

Fixes: d4aca42 ("drm/msm/dp:  always add fail-safe mode into connector mode list")
Signed-off-by: Kuogee Hsieh <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/481396/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
shahab-vahedi pushed a commit that referenced this pull request Nov 23, 2022
The root cause is the race as follows:
Thread #1                              Thread #2(irq ctx)

z_erofs_runqueue()
  struct z_erofs_decompressqueue io_A[];
  submit bio A
  z_erofs_decompress_kickoff(,,1)
                                       z_erofs_decompressqueue_endio(bio A)
                                       z_erofs_decompress_kickoff(,,-1)
                                       spin_lock_irqsave()
                                       atomic_add_return()
  io_wait_event()	-> pending_bios is already 0
  [end of function]
                                       wake_up_locked(io_A[]) // crash

Referenced backtrace in kernel 5.4:

[   10.129422] Unable to handle kernel paging request at virtual address eb0454a4
[   10.364157] CPU: 0 PID: 709 Comm: getprop Tainted: G        WC O      5.4.147-ab09225 #1
[   11.556325] [<c01b33b8>] (__wake_up_common) from [<c01b3300>] (__wake_up_locked+0x40/0x48)
[   11.565487] [<c01b3300>] (__wake_up_locked) from [<c044c8d0>] (z_erofs_vle_unzip_kickoff+0x6c/0xc0)
[   11.575438] [<c044c8d0>] (z_erofs_vle_unzip_kickoff) from [<c044c854>] (z_erofs_vle_read_endio+0x16c/0x17c)
[   11.586082] [<c044c854>] (z_erofs_vle_read_endio) from [<c06a80e8>] (clone_endio+0xb4/0x1d0)
[   11.595428] [<c06a80e8>] (clone_endio) from [<c04a1280>] (blk_update_request+0x150/0x4dc)
[   11.604516] [<c04a1280>] (blk_update_request) from [<c06dea28>] (mmc_blk_cqe_complete_rq+0x144/0x15c)
[   11.614640] [<c06dea28>] (mmc_blk_cqe_complete_rq) from [<c04a5d90>] (blk_done_softirq+0xb0/0xcc)
[   11.624419] [<c04a5d90>] (blk_done_softirq) from [<c010242c>] (__do_softirq+0x184/0x56c)
[   11.633419] [<c010242c>] (__do_softirq) from [<c01051e8>] (irq_exit+0xd4/0x138)
[   11.641640] [<c01051e8>] (irq_exit) from [<c010c314>] (__handle_domain_irq+0x94/0xd0)
[   11.650381] [<c010c314>] (__handle_domain_irq) from [<c04fde70>] (gic_handle_irq+0x50/0xd4)
[   11.659641] [<c04fde70>] (gic_handle_irq) from [<c0101b70>] (__irq_svc+0x70/0xb0)

Signed-off-by: Hongyu Jin <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Gao Xiang <[email protected]>
shahab-vahedi pushed a commit that referenced this pull request Nov 23, 2022
 into HEAD

KVM/riscv fixes for 5.18, take #2

- Remove 's' & 'u' as valid ISA extension

- Do not allow disabling the base extensions 'i'/'m'/'a'/'c'
shahab-vahedi pushed a commit that referenced this pull request Nov 23, 2022
Make a KVM_REQ_APICV_UPDATE request when creating a vCPU with an
in-kernel local APIC and APICv enabled at the module level.  Consuming
kvm_apicv_activated() and stuffing vcpu->arch.apicv_active directly can
race with __kvm_set_or_clear_apicv_inhibit(), as vCPU creation happens
before the vCPU is fully onlined, i.e. it won't get the request made to
"all" vCPUs.  If APICv is globally inhibited between setting apicv_active
and onlining the vCPU, the vCPU will end up running with APICv enabled
and trigger KVM's sanity check.

Mark APICv as active during vCPU creation if APICv is enabled at the
module level, both to be optimistic about it's final state, e.g. to avoid
additional VMWRITEs on VMX, and because there are likely bugs lurking
since KVM checks apicv_active in multiple vCPU creation paths.  While
keeping the current behavior of consuming kvm_apicv_activated() is
arguably safer from a regression perspective, force apicv_active so that
vCPU creation runs with deterministic state and so that if there are bugs,
they are found sooner than later, i.e. not when some crazy race condition
is hit.

  WARNING: CPU: 0 PID: 484 at arch/x86/kvm/x86.c:9877 vcpu_enter_guest+0x2ae3/0x3ee0 arch/x86/kvm/x86.c:9877
  Modules linked in:
  CPU: 0 PID: 484 Comm: syz-executor361 Not tainted 5.16.13 #2
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1~cloud0 04/01/2014
  RIP: 0010:vcpu_enter_guest+0x2ae3/0x3ee0 arch/x86/kvm/x86.c:9877
  Call Trace:
   <TASK>
   vcpu_run arch/x86/kvm/x86.c:10039 [inline]
   kvm_arch_vcpu_ioctl_run+0x337/0x15e0 arch/x86/kvm/x86.c:10234
   kvm_vcpu_ioctl+0x4d2/0xc80 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3727
   vfs_ioctl fs/ioctl.c:51 [inline]
   __do_sys_ioctl fs/ioctl.c:874 [inline]
   __se_sys_ioctl fs/ioctl.c:860 [inline]
   __x64_sys_ioctl+0x16d/0x1d0 fs/ioctl.c:860
   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
   do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
   entry_SYSCALL_64_after_hwframe+0x44/0xae

The bug was hit by a syzkaller spamming VM creation with 2 vCPUs and a
call to KVM_SET_GUEST_DEBUG.

  r0 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000000), 0x0, 0x0)
  r1 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0)
  ioctl$KVM_CAP_SPLIT_IRQCHIP(r1, 0x4068aea3, &(0x7f0000000000)) (async)
  r2 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x0) (async)
  r3 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x400000000000002)
  ioctl$KVM_SET_GUEST_DEBUG(r3, 0x4048ae9b, &(0x7f00000000c0)={0x5dda9c14aa95f5c5})
  ioctl$KVM_RUN(r2, 0xae80, 0x0)

Reported-by: Gaoning Pan <[email protected]>
Reported-by: Yongkang Jia <[email protected]>
Fixes: 8df14af ("kvm: x86: Add support for dynamic APICv activation")
Cc: [email protected]
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
xxkent added a commit that referenced this pull request May 3, 2023
With enabled CONFIG_DEBUG_PREEMPT the following stack-trace gets
printed on SMP ARC platforms with performance counters interrupts:

---------------------------------------8-------------------------------------
ARC perf        : 8 counters (48 bits), 128 conditions, [overflow IRQ support]
BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
caller is arc_pmu_device_probe+0x336/0x3c8
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.17.0 #2

Stack Trace:
  arc_unwind_core+0xe8/0x118
  dump_stack_lvl+0x48/0x68
  check_preemption_disabled+0xb4/0xb8
  arc_pmu_device_probe+0x336/0x3c8
  platform_probe+0x30/0x80
  really_probe.part.0+0x88/0x240
  driver_probe_device+0x86/0x1ec
  __driver_attach+0x8a/0x144
  bus_for_each_dev+0x38/0x64
  bus_add_driver+0x116/0x17c
  driver_register+0x4c/0xdc
  do_one_initcall+0x30/0x16c
  kernel_init_freeable+0x1be/0x224
workingset: timestamp_bits=14 max_order=15 bucket_order=1
io scheduler mq-deadline registered
--------------------------------------8-------------------------------------
That happens because of use of this_cpu_ptr() in
https://elixir.bootlin.com/linux/v5.17/source/arch/arc/kernel/perf_event.c#L811
without explicitly disabled preemption, see
https://www.kernel.org/doc/html/latest/core-api/this_cpu_ops.html#special-operations.

In this commit we  explicitly disable preemption around that code.
xxkent added a commit that referenced this pull request May 3, 2023
With enabled CONFIG_DEBUG_PREEMPT the following stack-trace gets
printed on SMP ARC platforms with performance counters interrupts:

---------------------------------------8-------------------------------------
ARC perf        : 8 counters (48 bits), 128 conditions, [overflow IRQ support]
BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
caller is arc_pmu_device_probe+0x336/0x3c8
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.17.0 #2

Stack Trace:
  arc_unwind_core+0xe8/0x118
  dump_stack_lvl+0x48/0x68
  check_preemption_disabled+0xb4/0xb8
  arc_pmu_device_probe+0x336/0x3c8
  platform_probe+0x30/0x80
  really_probe.part.0+0x88/0x240
  driver_probe_device+0x86/0x1ec
  __driver_attach+0x8a/0x144
  bus_for_each_dev+0x38/0x64
  bus_add_driver+0x116/0x17c
  driver_register+0x4c/0xdc
  do_one_initcall+0x30/0x16c
  kernel_init_freeable+0x1be/0x224
workingset: timestamp_bits=14 max_order=15 bucket_order=1
io scheduler mq-deadline registered
--------------------------------------8-------------------------------------
That happens because of use of this_cpu_ptr() in
https://elixir.bootlin.com/linux/v5.17/source/arch/arc/kernel/perf_event.c#L811
without explicitly disabled preemption, see
https://www.kernel.org/doc/html/latest/core-api/this_cpu_ops.html#special-operations.

In this commit we  explicitly disable preemption around that code.
xxkent pushed a commit that referenced this pull request Aug 4, 2023
THe high level structure of most ARC exception handlers is
 1. save regfile with EXCEPTION_PROLOGUE
 2. setup r0: EFA (not part of pt_regs)
 3. setup r1: pointer to pt_regs (SP)
 4. drop down to pure kernel mode (from exception)
 5. call the Linux "C" handler

Remove the boiler plate code by moving #2, #3, #4 into #1.

The exceptions to most exceptions are syscall Trap and Machine check
which don't do some of above for various reasons, so call a newly
introduced variant EXCEPTION_PROLOGUE_KEEP_AE (same as original
EXCEPTION_PROLOGUE)

Signed-off-by: Vineet Gupta <[email protected]>
xxkent pushed a commit that referenced this pull request Aug 4, 2023
xxkent pushed a commit that referenced this pull request Aug 4, 2023
xxkent pushed a commit that referenced this pull request Aug 4, 2023
xxkent pushed a commit that referenced this pull request Aug 4, 2023
xxkent pushed a commit that referenced this pull request Aug 4, 2023
… 64-bit

[Fix link errors with -mcmodel=large and LINK_BASE=0xFFFF_0000_0000 #2]

Current asm code handles ex_tables entries (PC markers) as 32-bits values,
while they are actually 64-bits wide.

|  LD      .tmp_vmlinux1
| arch/arc/kernel/unwind.o:(__ex_table+0x0): relocation truncated to fit: R_ARC_32 against `no symbol'
| make[1]: *** [/home/vineetg/arc/v2-kernel/Makefile:1078: vmlinux] Error 1

Signed-off-by: Vineet Gupta <[email protected]>
xxkent added a commit that referenced this pull request Aug 4, 2023
With enabled CONFIG_DEBUG_PREEMPT the following stack-trace gets
printed on SMP ARC platforms with performance counters interrupts:

---------------------------------------8-------------------------------------
ARC perf        : 8 counters (48 bits), 128 conditions, [overflow IRQ support]
BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
caller is arc_pmu_device_probe+0x336/0x3c8
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.17.0 #2

Stack Trace:
  arc_unwind_core+0xe8/0x118
  dump_stack_lvl+0x48/0x68
  check_preemption_disabled+0xb4/0xb8
  arc_pmu_device_probe+0x336/0x3c8
  platform_probe+0x30/0x80
  really_probe.part.0+0x88/0x240
  driver_probe_device+0x86/0x1ec
  __driver_attach+0x8a/0x144
  bus_for_each_dev+0x38/0x64
  bus_add_driver+0x116/0x17c
  driver_register+0x4c/0xdc
  do_one_initcall+0x30/0x16c
  kernel_init_freeable+0x1be/0x224
workingset: timestamp_bits=14 max_order=15 bucket_order=1
io scheduler mq-deadline registered
--------------------------------------8-------------------------------------
That happens because of use of this_cpu_ptr() in
https://elixir.bootlin.com/linux/v5.17/source/arch/arc/kernel/perf_event.c#L811
without explicitly disabled preemption, see
https://www.kernel.org/doc/html/latest/core-api/this_cpu_ops.html#special-operations.

In this commit we  explicitly disable preemption around that code.
xxkent added a commit that referenced this pull request Oct 17, 2023
Since major opcodes for ARCv2/ARCv2/ARCv3 are slightly different
between ISAs, then we need to recognize instructions sizes differently
depending on the architecture.
xxkent pushed a commit that referenced this pull request Oct 17, 2023
THe high level structure of most ARC exception handlers is
 1. save regfile with EXCEPTION_PROLOGUE
 2. setup r0: EFA (not part of pt_regs)
 3. setup r1: pointer to pt_regs (SP)
 4. drop down to pure kernel mode (from exception)
 5. call the Linux "C" handler

Remove the boiler plate code by moving #2, #3, #4 into #1.

The exceptions to most exceptions are syscall Trap and Machine check
which don't do some of above for various reasons, so call a newly
introduced variant EXCEPTION_PROLOGUE_KEEP_AE (same as original
EXCEPTION_PROLOGUE)

Signed-off-by: Vineet Gupta <[email protected]>
xxkent pushed a commit that referenced this pull request Oct 17, 2023
xxkent pushed a commit that referenced this pull request Oct 17, 2023
xxkent pushed a commit that referenced this pull request Oct 17, 2023
… 64-bit

[Fix link errors with -mcmodel=large and LINK_BASE=0xFFFF_0000_0000 #2]

Current asm code handles ex_tables entries (PC markers) as 32-bits values,
while they are actually 64-bits wide.

|  LD      .tmp_vmlinux1
| arch/arc/kernel/unwind.o:(__ex_table+0x0): relocation truncated to fit: R_ARC_32 against `no symbol'
| make[1]: *** [/home/vineetg/arc/v2-kernel/Makefile:1078: vmlinux] Error 1

Signed-off-by: Vineet Gupta <[email protected]>
xxkent added a commit that referenced this pull request Oct 17, 2023
With enabled CONFIG_DEBUG_PREEMPT the following stack-trace gets
printed on SMP ARC platforms with performance counters interrupts:

---------------------------------------8-------------------------------------
ARC perf        : 8 counters (48 bits), 128 conditions, [overflow IRQ support]
BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
caller is arc_pmu_device_probe+0x336/0x3c8
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.17.0 #2

Stack Trace:
  arc_unwind_core+0xe8/0x118
  dump_stack_lvl+0x48/0x68
  check_preemption_disabled+0xb4/0xb8
  arc_pmu_device_probe+0x336/0x3c8
  platform_probe+0x30/0x80
  really_probe.part.0+0x88/0x240
  driver_probe_device+0x86/0x1ec
  __driver_attach+0x8a/0x144
  bus_for_each_dev+0x38/0x64
  bus_add_driver+0x116/0x17c
  driver_register+0x4c/0xdc
  do_one_initcall+0x30/0x16c
  kernel_init_freeable+0x1be/0x224
workingset: timestamp_bits=14 max_order=15 bucket_order=1
io scheduler mq-deadline registered
--------------------------------------8-------------------------------------
That happens because of use of this_cpu_ptr() in
https://elixir.bootlin.com/linux/v5.17/source/arch/arc/kernel/perf_event.c#L811
without explicitly disabled preemption, see
https://www.kernel.org/doc/html/latest/core-api/this_cpu_ops.html#special-operations.

In this commit we  explicitly disable preemption around that code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants