-
Notifications
You must be signed in to change notification settings - Fork 57.9k
Tinkered with the README. #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Please note that pull requests are not the proper procedure to submit patches to the Linux kernel (Linus put the kernel up here because kernel.org's master mirror is down; it seems that he doesn't like the pull request system[1], but github does not allow him to disable it). Please read Documentation/SubmittingPatches - you must write a proper commit message (actually describing what changed, not just 'tinkered with'), add a Signed-Off-By line, and submit to the linux kernel mailing list. |
That's a bit too much work for the usual github stuff. Perhaps I'll just leave it alone and let the usual kernel.org hackers help out. |
The following command sequence triggers an oops. # mount /dev/sdb1 /mnt # echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete # umount /mnt general protection fault: 0000 [#1] PREEMPT SMP CPU 2 Modules linked in: Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ torvalds#8 Bochs Bochs RIP: 0010:[<ffffffff810d0879>] [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60 ... Call Trace: [<ffffffff810d2845>] lock_acquire+0x95/0x140 [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50 [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70 [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0 [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0 [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0 [<ffffffff811c40df>] blkdev_put+0x5f/0x190 [<ffffffff8118f18d>] kill_block_super+0x4d/0x80 [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70 [<ffffffff8119003a>] deactivate_super+0x4a/0x70 [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130 [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0 [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b This is because bdev holds on to disk but disk doesn't pin the associated queue. If a SCSI device is removed while the device is still open, the sdev puts the base reference to the queue on release. When the bdev is finally released, the associated queue is already gone along with the bdi and bdev_inode_switch_bdi() ends up dereferencing already freed bdi. Even if it were not for this bug, disk not holding onto the associated queue is very unusual and error-prone. Fix it by making add_disk() take an extra reference to its queue and put it on disk_release() and ensuring that disk and its fops owner are put in that order after all accesses to the disk and queue are complete. Signed-off-by: Tejun Heo <[email protected]> Cc: Jens Axboe <[email protected]> Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
This patch validates sdev pointer in scsi_dh_activate before proceeding further. Without this check we might see the panic as below. I have seen this panic multiple times.. Call trace: #0 [ffff88007d647b50] machine_kexec at ffffffff81020902 #1 [ffff88007d647ba0] crash_kexec at ffffffff810875b0 #2 [ffff88007d647c70] oops_end at ffffffff8139c650 #3 [ffff88007d647c90] __bad_area_nosemaphore at ffffffff8102dd15 #4 [ffff88007d647d50] page_fault at ffffffff8139b8cf [exception RIP: scsi_dh_activate+0x82] RIP: ffffffffa0041922 RSP: ffff88007d647e00 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000093c5 RDX: 00000000000093c5 RSI: ffffffffa02e6640 RDI: ffff88007cc88988 RBP: 000000000000000f R8: ffff88007d646000 R9: 0000000000000000 R10: ffff880082293790 R11: 00000000ffffffff R12: ffff88007cc88988 R13: 0000000000000000 R14: 0000000000000286 R15: ffff880037b845e0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #5 [ffff88007d647e38] run_workqueue at ffffffff81060268 torvalds#6 [ffff88007d647e78] worker_thread at ffffffff81060386 torvalds#7 [ffff88007d647ee8] kthread at ffffffff81064436 torvalds#8 [ffff88007d647f48] kernel_thread at ffffffff81003fba Signed-off-by: Babu Moger <[email protected]> Cc: [email protected] Signed-off-by: James Bottomley <[email protected]>
The following command sequence triggers an oops. # mount /dev/sdb1 /mnt # echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete # umount /mnt general protection fault: 0000 [#1] PREEMPT SMP CPU 2 Modules linked in: Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ torvalds#8 Bochs Bochs RIP: 0010:[<ffffffff810d0879>] [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60 ... Call Trace: [<ffffffff810d2845>] lock_acquire+0x95/0x140 [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50 [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70 [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0 [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0 [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0 [<ffffffff811c40df>] blkdev_put+0x5f/0x190 [<ffffffff8118f18d>] kill_block_super+0x4d/0x80 [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70 [<ffffffff8119003a>] deactivate_super+0x4a/0x70 [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130 [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0 [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b This is because bdev holds on to disk but disk doesn't pin the associated queue. If a SCSI device is removed while the device is still open, the sdev puts the base reference to the queue on release. When the bdev is finally released, the associated queue is already gone along with the bdi and bdev_inode_switch_bdi() ends up dereferencing already freed bdi. Even if it were not for this bug, disk not holding onto the associated queue is very unusual and error-prone. Fix it by making add_disk() take an extra reference to its queue and put it on disk_release() and ensuring that disk and its fops owner are put in that order after all accesses to the disk and queue are complete. Signed-off-by: Tejun Heo <[email protected]> Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
commit a18a920 upstream. This patch validates sdev pointer in scsi_dh_activate before proceeding further. Without this check we might see the panic as below. I have seen this panic multiple times.. Call trace: #0 [ffff88007d647b50] machine_kexec at ffffffff81020902 #1 [ffff88007d647ba0] crash_kexec at ffffffff810875b0 #2 [ffff88007d647c70] oops_end at ffffffff8139c650 #3 [ffff88007d647c90] __bad_area_nosemaphore at ffffffff8102dd15 #4 [ffff88007d647d50] page_fault at ffffffff8139b8cf [exception RIP: scsi_dh_activate+0x82] RIP: ffffffffa0041922 RSP: ffff88007d647e00 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000093c5 RDX: 00000000000093c5 RSI: ffffffffa02e6640 RDI: ffff88007cc88988 RBP: 000000000000000f R8: ffff88007d646000 R9: 0000000000000000 R10: ffff880082293790 R11: 00000000ffffffff R12: ffff88007cc88988 R13: 0000000000000000 R14: 0000000000000286 R15: ffff880037b845e0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #5 [ffff88007d647e38] run_workqueue at ffffffff81060268 torvalds#6 [ffff88007d647e78] worker_thread at ffffffff81060386 torvalds#7 [ffff88007d647ee8] kthread at ffffffff81064436 torvalds#8 [ffff88007d647f48] kernel_thread at ffffffff81003fba Signed-off-by: Babu Moger <[email protected]> Signed-off-by: James Bottomley <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit f992ae8 upstream. The following command sequence triggers an oops. # mount /dev/sdb1 /mnt # echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete # umount /mnt general protection fault: 0000 [#1] PREEMPT SMP CPU 2 Modules linked in: Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ torvalds#8 Bochs Bochs RIP: 0010:[<ffffffff810d0879>] [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60 ... Call Trace: [<ffffffff810d2845>] lock_acquire+0x95/0x140 [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50 [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70 [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0 [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0 [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0 [<ffffffff811c40df>] blkdev_put+0x5f/0x190 [<ffffffff8118f18d>] kill_block_super+0x4d/0x80 [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70 [<ffffffff8119003a>] deactivate_super+0x4a/0x70 [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130 [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0 [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b This is because bdev holds on to disk but disk doesn't pin the associated queue. If a SCSI device is removed while the device is still open, the sdev puts the base reference to the queue on release. When the bdev is finally released, the associated queue is already gone along with the bdi and bdev_inode_switch_bdi() ends up dereferencing already freed bdi. Even if it were not for this bug, disk not holding onto the associated queue is very unusual and error-prone. Fix it by making add_disk() take an extra reference to its queue and put it on disk_release() and ensuring that disk and its fops owner are put in that order after all accesses to the disk and queue are complete. Signed-off-by: Tejun Heo <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
If the pte mapping in generic_perform_write() is unmapped between iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the "copied" parameter to ->end_write can be zero. ext4 couldn't cope with it with delayed allocations enabled. This skips the i_disksize enlargement logic if copied is zero and no new data was appeneded to the inode. gdb> bt #0 0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\ 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467 #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 #2 0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\ ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440 #3 generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\ os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482 #4 0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\ xffff88001e26be40) at mm/filemap.c:2600 #5 0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\ zed out>, pos=<value optimized out>) at mm/filemap.c:2632 #6 0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\ t fs/ext4/file.c:136 #7 0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \ ppos=0xffff88001e26bf48) at fs/read_write.c:406 #8 0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\ 000, pos=0xffff88001e26bf48) at fs/read_write.c:435 #9 0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\ 4000) at fs/read_write.c:487 #10 <signal handler called> #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ () #12 0x0000000000000000 in ?? () gdb> print offset $22 = 0xffffffffffffffff gdb> print idx $23 = 0xffffffff gdb> print inode->i_blkbits $24 = 0xc gdb> up #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 2512 if (ext4_da_should_update_i_disksize(page, end)) { gdb> print start $25 = 0x0 gdb> print end $26 = 0xffffffffffffffff gdb> print pos $27 = 0x108000 gdb> print new_i_size $28 = 0x108000 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize $29 = 0xd9000 gdb> down 2467 for (i = 0; i < idx; i++) gdb> print i $30 = 0xd44acbee This is 100% reproducible with some autonuma development code tuned in a very aggressive manner (not normal way even for knumad) which does "exotic" changes to the ptes. It wouldn't normally trigger but I don't see why it can't happen normally if the page is added to swap cache in between the two faults leading to "copied" being zero (which then hangs in ext4). So it should be fixed. Especially possible with lumpy reclaim (albeit disabled if compaction is enabled) as that would ignore the young bits in the ptes. Signed-off-by: Andrea Arcangeli <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]> Cc: [email protected]
Nothing requires that we lock the filesystem until the root inode is provided. Also iget5_locked() triggers a warning because we are holding the filesystem lock while allocating the inode, which result in a lockdep suspicion that we have a lock inversion against the reclaim path: [ 1986.896979] ================================= [ 1986.896990] [ INFO: inconsistent lock state ] [ 1986.896997] 3.1.1-main #8 [ 1986.897001] --------------------------------- [ 1986.897007] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. [ 1986.897016] kswapd0/16 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 1986.897023] (&REISERFS_SB(s)->lock){+.+.?.}, at: [<c01f8bd4>] reiserfs_write_lock+0x20/0x2a [ 1986.897044] {RECLAIM_FS-ON-W} state was registered at: [ 1986.897050] [<c014a5b9>] mark_held_locks+0xae/0xd0 [ 1986.897060] [<c014aab3>] lockdep_trace_alloc+0x7d/0x91 [ 1986.897068] [<c0190ee0>] kmem_cache_alloc+0x1a/0x93 [ 1986.897078] [<c01e7728>] reiserfs_alloc_inode+0x13/0x3d [ 1986.897088] [<c01a5b06>] alloc_inode+0x14/0x5f [ 1986.897097] [<c01a5cb9>] iget5_locked+0x62/0x13a [ 1986.897106] [<c01e99e0>] reiserfs_fill_super+0x410/0x8b9 [ 1986.897114] [<c01953da>] mount_bdev+0x10b/0x159 [ 1986.897123] [<c01e764d>] get_super_block+0x10/0x12 [ 1986.897131] [<c0195b38>] mount_fs+0x59/0x12d [ 1986.897138] [<c01a80d1>] vfs_kern_mount+0x45/0x7a [ 1986.897147] [<c01a83e3>] do_kern_mount+0x2f/0xb0 [ 1986.897155] [<c01a987a>] do_mount+0x5c2/0x612 [ 1986.897163] [<c01a9a72>] sys_mount+0x61/0x8f [ 1986.897170] [<c044060c>] sysenter_do_call+0x12/0x32 [ 1986.897181] irq event stamp: 7509691 [ 1986.897186] hardirqs last enabled at (7509691): [<c0190f34>] kmem_cache_alloc+0x6e/0x93 [ 1986.897197] hardirqs last disabled at (7509690): [<c0190eea>] kmem_cache_alloc+0x24/0x93 [ 1986.897209] softirqs last enabled at (7508896): [<c01294bd>] __do_softirq+0xee/0xfd [ 1986.897222] softirqs last disabled at (7508859): [<c01030ed>] do_softirq+0x50/0x9d [ 1986.897234] [ 1986.897235] other info that might help us debug this: [ 1986.897242] Possible unsafe locking scenario: [ 1986.897244] [ 1986.897250] CPU0 [ 1986.897254] ---- [ 1986.897257] lock(&REISERFS_SB(s)->lock); [ 1986.897265] <Interrupt> [ 1986.897269] lock(&REISERFS_SB(s)->lock); [ 1986.897276] [ 1986.897277] *** DEADLOCK *** [ 1986.897278] [ 1986.897286] no locks held by kswapd0/16. [ 1986.897291] [ 1986.897292] stack backtrace: [ 1986.897299] Pid: 16, comm: kswapd0 Not tainted 3.1.1-main #8 [ 1986.897306] Call Trace: [ 1986.897314] [<c0439e76>] ? printk+0xf/0x11 [ 1986.897324] [<c01482d1>] print_usage_bug+0x20e/0x21a [ 1986.897332] [<c01479b8>] ? print_irq_inversion_bug+0x172/0x172 [ 1986.897341] [<c014855c>] mark_lock+0x27f/0x483 [ 1986.897349] [<c0148d88>] __lock_acquire+0x628/0x1472 [ 1986.897358] [<c0149fae>] lock_acquire+0x47/0x5e [ 1986.897366] [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a [ 1986.897384] [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a [ 1986.897397] [<c043b5ef>] mutex_lock_nested+0x35/0x26f [ 1986.897409] [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a [ 1986.897421] [<c01f8bd4>] reiserfs_write_lock+0x20/0x2a [ 1986.897433] [<c01e2edd>] map_block_for_writepage+0xc9/0x590 [ 1986.897448] [<c01b1706>] ? create_empty_buffers+0x33/0x8f [ 1986.897461] [<c0121124>] ? get_parent_ip+0xb/0x31 [ 1986.897472] [<c043ef7f>] ? sub_preempt_count+0x81/0x8e [ 1986.897485] [<c043cae0>] ? _raw_spin_unlock+0x27/0x3d [ 1986.897496] [<c0121124>] ? get_parent_ip+0xb/0x31 [ 1986.897508] [<c01e355d>] reiserfs_writepage+0x1b9/0x3e7 [ 1986.897521] [<c0173b40>] ? clear_page_dirty_for_io+0xcb/0xde [ 1986.897533] [<c014a6e3>] ? trace_hardirqs_on_caller+0x108/0x138 [ 1986.897546] [<c014a71e>] ? trace_hardirqs_on+0xb/0xd [ 1986.897559] [<c0177b38>] shrink_page_list+0x34f/0x5e2 [ 1986.897572] [<c01780a7>] shrink_inactive_list+0x172/0x22c [ 1986.897585] [<c0178464>] shrink_zone+0x303/0x3b1 [ 1986.897597] [<c043cae0>] ? _raw_spin_unlock+0x27/0x3d [ 1986.897611] [<c01788c9>] kswapd+0x3b7/0x5f2 The deadlock shouldn't happen since we are doing that allocation in the mount path, the filesystem is not available for any reclaim. Still the warning is annoying. To solve this, acquire the lock later only where we need it, right before calling reiserfs_read_locked_inode() that wants to lock to walk the tree. Reported-by: Knut Petersen <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Al Viro <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Jeff Mahoney <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
$ wget "http://pkgs.fedoraproject.org/gitweb/?p=kernel.git;a=blob_plain;f=mac80211_offchannel_rework_revert.patch;h=859799714cd85a58450ecde4a1dabc5adffd5100;hb=refs/heads/f16" -O mac80211_offchannel_rework_revert.patch $ patch -p1 --dry-run < mac80211_offchannel_rework_revert.patch patching file net/mac80211/ieee80211_i.h Hunk #1 succeeded at 702 (offset 8 lines). Hunk #2 succeeded at 712 (offset 8 lines). Hunk #3 succeeded at 1143 (offset -57 lines). patching file net/mac80211/main.c patching file net/mac80211/offchannel.c Hunk #1 succeeded at 18 (offset 1 line). Hunk #2 succeeded at 42 (offset 1 line). Hunk #3 succeeded at 78 (offset 1 line). Hunk #4 succeeded at 96 (offset 1 line). Hunk #5 succeeded at 162 (offset 1 line). Hunk torvalds#6 succeeded at 182 (offset 1 line). patching file net/mac80211/rx.c Hunk #1 succeeded at 421 (offset 4 lines). Hunk #2 succeeded at 2864 (offset 87 lines). patching file net/mac80211/scan.c Hunk #1 succeeded at 213 (offset 1 line). Hunk #2 succeeded at 256 (offset 2 lines). Hunk #3 succeeded at 288 (offset 2 lines). Hunk #4 succeeded at 333 (offset 2 lines). Hunk #5 succeeded at 482 (offset 2 lines). Hunk torvalds#6 succeeded at 498 (offset 2 lines). Hunk torvalds#7 succeeded at 516 (offset 2 lines). Hunk torvalds#8 succeeded at 530 (offset 2 lines). Hunk torvalds#9 succeeded at 555 (offset 2 lines). patching file net/mac80211/tx.c Hunk #1 succeeded at 259 (offset 1 line). patching file net/mac80211/work.c Hunk #1 succeeded at 899 (offset -2 lines). Hunk #2 succeeded at 949 (offset -2 lines). Hunk #3 succeeded at 1046 (offset -2 lines). Hunk #4 succeeded at 1054 (offset -2 lines).
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not update the real num tx queues. netdev_queue_update_kobjects() is already called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when upper layer driver, e.g., FCoE protocol stack is monitoring the netdev event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove extra queues allocated for FCoE, the associated txq sysfs kobjects are already removed, and trying to update the real num queues would cause something like below: ... PID: 25138 TASK: ffff88021e64c440 CPU: 3 COMMAND: "kworker/3:3" #0 [ffff88021f007760] machine_kexec at ffffffff810226d9 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d #2 [ffff88021f0078a0] oops_end at ffffffff813bca78 #3 [ffff88021f0078d0] no_context at ffffffff81029e72 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045 [exception RIP: sysfs_find_dirent+17] RIP: ffffffff81178611 RSP: ffff88021f007bc0 RFLAGS: 00010246 RAX: ffff88021e64c440 RBX: ffffffff8156cc63 RCX: 0000000000000004 RDX: ffffffff8156cc63 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88021f007be0 R8: 0000000000000004 R9: 0000000000000008 R10: ffffffff816fed00 R11: 0000000000000004 R12: 0000000000000000 R13: ffffffff8156cc63 R14: 0000000000000000 R15: ffff8802222a0000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27 torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9 torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38 torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe] torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe] torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe] torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q] torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe] torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe] torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513 torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6 torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4 Signed-off-by: Yi Zou <[email protected]> Tested-by: Ross Brattain <[email protected]> Tested-by: Stephen Ko <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not update the real num tx queues. netdev_queue_update_kobjects() is already called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when upper layer driver, e.g., FCoE protocol stack is monitoring the netdev event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove extra queues allocated for FCoE, the associated txq sysfs kobjects are already removed, and trying to update the real num queues would cause something like below: ... PID: 25138 TASK: ffff88021e64c440 CPU: 3 COMMAND: "kworker/3:3" #0 [ffff88021f007760] machine_kexec at ffffffff810226d9 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d #2 [ffff88021f0078a0] oops_end at ffffffff813bca78 #3 [ffff88021f0078d0] no_context at ffffffff81029e72 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045 [exception RIP: sysfs_find_dirent+17] RIP: ffffffff81178611 RSP: ffff88021f007bc0 RFLAGS: 00010246 RAX: ffff88021e64c440 RBX: ffffffff8156cc63 RCX: 0000000000000004 RDX: ffffffff8156cc63 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88021f007be0 R8: 0000000000000004 R9: 0000000000000008 R10: ffffffff816fed00 R11: 0000000000000004 R12: 0000000000000000 R13: ffffffff8156cc63 R14: 0000000000000000 R15: ffff8802222a0000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27 torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9 torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38 torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe] torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe] torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe] torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q] torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe] torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe] torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513 torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6 torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4 Signed-off-by: Yi Zou <[email protected]> Tested-by: Ross Brattain <[email protected]> Tested-by: Stephen Ko <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a torvalds#7 [d72d3d14] zone_watermark_ok at c02d26cb torvalds#8 [d72d3d2c] compact_zone at c030b8d torvalds#9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a torvalds#7 [d72d3d14] zone_watermark_ok at c02d26cb torvalds#8 [d72d3d2c] compact_zone at c030b8d torvalds#9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
ata_port lifetime in libata follows the host. In libsas it follows the scsi_target. Once scsi_remove_device() has caused all commands to be completed it allows scsi_remove_target() to immediately proceed to freeing the ata_port causing bug reports like: [ 848.393333] BUG: spinlock bad magic on CPU#4, kworker/u:2/5107 [ 848.400262] general protection fault: 0000 [#1] SMP [ 848.406244] CPU 4 [ 848.408310] Modules linked in: nls_utf8 ipv6 uinput i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma dca sg sd_mod sr_mod cdrom ahci libahci isci libsas libata scsi_transport_sas [last unloaded: scsi_wait_scan] [ 848.432060] [ 848.434137] Pid: 5107, comm: kworker/u:2 Not tainted 3.2.0-isci+ torvalds#8 Intel Corporation S2600CP/S2600CP [ 848.445310] RIP: 0010:[<ffffffff8126a68c>] [<ffffffff8126a68c>] spin_dump+0x5e/0x8c [ 848.454787] RSP: 0018:ffff8807f868dca0 EFLAGS: 00010002 [ 848.461137] RAX: 0000000000000048 RBX: ffff8807fe86a630 RCX: ffffffff817d0be0 [ 848.469520] RDX: 0000000000000000 RSI: ffffffff814af1cf RDI: 0000000000000002 [ 848.477959] RBP: ffff8807f868dcb0 R08: 00000000ffffffff R09: 000000006b6b6b6b [ 848.486327] R10: 000000000003fb8c R11: ffffffff81a19448 R12: 6b6b6b6b6b6b6b6b [ 848.494699] R13: ffff8808027dc520 R14: 0000000000000000 R15: 000000000000001e [ 848.503067] FS: 0000000000000000(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000000 [ 848.512899] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 848.519710] CR2: 00007ff77d001000 CR3: 00000007f7a5d000 CR4: 00000000000406e0 [ 848.528072] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 848.536446] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 848.544831] Process kworker/u:2 (pid: 5107, threadinfo ffff8807f868c000, task ffff8807ff348000) [ 848.555327] Stack: [ 848.557959] ffff8807fe86a630 ffff8807fe86a630 ffff8807f868dcd0 ffffffff8126a6e0 [ 848.567072] ffffffff817c142f ffff8807fe86a630 ffff8807f868dcf0 ffffffff8126a703 [ 848.576190] ffff8808027dc520 0000000000000286 ffff8807f868dd10 ffffffff814af1bb [ 848.585281] Call Trace: [ 848.588409] [<ffffffff8126a6e0>] spin_bug+0x26/0x28 [ 848.594357] [<ffffffff8126a703>] do_raw_spin_unlock+0x21/0x88 [ 848.601283] [<ffffffff814af1bb>] _raw_spin_unlock_irqrestore+0x2c/0x65 [ 848.609089] [<ffffffffa001c103>] ata_scsi_port_error_handler+0x548/0x557 [libata] [ 848.618331] [<ffffffff81061813>] ? async_schedule+0x17/0x17 [ 848.625060] [<ffffffffa004f30f>] async_sas_ata_eh+0x45/0x69 [libsas] [ 848.632655] [<ffffffff810618aa>] async_run_entry_fn+0x97/0x125 [ 848.639670] [<ffffffff81057439>] process_one_work+0x207/0x38d [ 848.646577] [<ffffffff8105738c>] ? process_one_work+0x15a/0x38d [ 848.653681] [<ffffffff810576f7>] worker_thread+0x138/0x21c [ 848.660305] [<ffffffff810575bf>] ? process_one_work+0x38d/0x38d [ 848.667493] [<ffffffff8105b098>] kthread+0x9d/0xa5 [ 848.673382] [<ffffffff8106e1bd>] ? trace_hardirqs_on_caller+0x12f/0x166 [ 848.681304] [<ffffffff814b7704>] kernel_thread_helper+0x4/0x10 [ 848.688324] [<ffffffff814af534>] ? retint_restore_args+0x13/0x13 [ 848.695530] [<ffffffff8105affb>] ? __init_kthread_worker+0x5b/0x5b [ 848.702929] [<ffffffff814b7700>] ? gs_change+0x13/0x13 [ 848.709155] Code: 00 00 48 8d 88 38 04 00 00 44 8b 80 84 02 00 00 31 c0 e8 cf 1b 24 00 41 83 c8 ff 44 8b 4b 08 48 c7 c1 e0 0b 7d 81 4d 85 e4 74 10 <45> 8b 84 24 84 02 00 00 49 8d 8c 24 38 04 00 00 8b 53 04 48 89 [ 848.732467] RIP [<ffffffff8126a68c>] spin_dump+0x5e/0x8c [ 848.738905] RSP <ffff8807f868dca0> [ 848.743743] ---[ end trace 143161646eee8caa ]--- ...so arrange for the ata_port to have the same end of life as the domain device. Reported-by: Marcin Tomczak <[email protected]> Acked-by: Jeff Garzik <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d14] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
BugLink: http://bugs.launchpad.net/bugs/890952 commit a18a920 upstream. This patch validates sdev pointer in scsi_dh_activate before proceeding further. Without this check we might see the panic as below. I have seen this panic multiple times.. Call trace: #0 [ffff88007d647b50] machine_kexec at ffffffff81020902 #1 [ffff88007d647ba0] crash_kexec at ffffffff810875b0 #2 [ffff88007d647c70] oops_end at ffffffff8139c650 #3 [ffff88007d647c90] __bad_area_nosemaphore at ffffffff8102dd15 #4 [ffff88007d647d50] page_fault at ffffffff8139b8cf [exception RIP: scsi_dh_activate+0x82] RIP: ffffffffa0041922 RSP: ffff88007d647e00 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000093c5 RDX: 00000000000093c5 RSI: ffffffffa02e6640 RDI: ffff88007cc88988 RBP: 000000000000000f R8: ffff88007d646000 R9: 0000000000000000 R10: ffff880082293790 R11: 00000000ffffffff R12: ffff88007cc88988 R13: 0000000000000000 R14: 0000000000000286 R15: ffff880037b845e0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #5 [ffff88007d647e38] run_workqueue at ffffffff81060268 torvalds#6 [ffff88007d647e78] worker_thread at ffffffff81060386 torvalds#7 [ffff88007d647ee8] kthread at ffffffff81064436 torvalds#8 [ffff88007d647f48] kernel_thread at ffffffff81003fba Signed-off-by: Babu Moger <[email protected]> Signed-off-by: James Bottomley <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Tim Gardner <[email protected]>
BugLink: http://bugs.launchpad.net/bugs/890952 commit f992ae8 upstream. The following command sequence triggers an oops. # mount /dev/sdb1 /mnt # echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete # umount /mnt general protection fault: 0000 [#1] PREEMPT SMP CPU 2 Modules linked in: Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ torvalds#8 Bochs Bochs RIP: 0010:[<ffffffff810d0879>] [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60 ... Call Trace: [<ffffffff810d2845>] lock_acquire+0x95/0x140 [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50 [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70 [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0 [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0 [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0 [<ffffffff811c40df>] blkdev_put+0x5f/0x190 [<ffffffff8118f18d>] kill_block_super+0x4d/0x80 [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70 [<ffffffff8119003a>] deactivate_super+0x4a/0x70 [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130 [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0 [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b This is because bdev holds on to disk but disk doesn't pin the associated queue. If a SCSI device is removed while the device is still open, the sdev puts the base reference to the queue on release. When the bdev is finally released, the associated queue is already gone along with the bdi and bdev_inode_switch_bdi() ends up dereferencing already freed bdi. Even if it were not for this bug, disk not holding onto the associated queue is very unusual and error-prone. Fix it by making add_disk() take an extra reference to its queue and put it on disk_release() and ensuring that disk and its fops owner are put in that order after all accesses to the disk and queue are complete. Signed-off-by: Tejun Heo <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Tim Gardner <[email protected]>
BugLink: http://bugs.launchpad.net/bugs/907778 commit ea51d13 upstream. If the pte mapping in generic_perform_write() is unmapped between iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the "copied" parameter to ->end_write can be zero. ext4 couldn't cope with it with delayed allocations enabled. This skips the i_disksize enlargement logic if copied is zero and no new data was appeneded to the inode. gdb> bt #0 0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\ 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467 #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 #2 0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\ ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440 #3 generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\ os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482 #4 0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\ xffff88001e26be40) at mm/filemap.c:2600 #5 0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\ zed out>, pos=<value optimized out>) at mm/filemap.c:2632 torvalds#6 0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\ t fs/ext4/file.c:136 torvalds#7 0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \ ppos=0xffff88001e26bf48) at fs/read_write.c:406 torvalds#8 0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\ 000, pos=0xffff88001e26bf48) at fs/read_write.c:435 torvalds#9 0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\ 4000) at fs/read_write.c:487 torvalds#10 <signal handler called> torvalds#11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ () torvalds#12 0x0000000000000000 in ?? () gdb> print offset $22 = 0xffffffffffffffff gdb> print idx $23 = 0xffffffff gdb> print inode->i_blkbits $24 = 0xc gdb> up #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 2512 if (ext4_da_should_update_i_disksize(page, end)) { gdb> print start $25 = 0x0 gdb> print end $26 = 0xffffffffffffffff gdb> print pos $27 = 0x108000 gdb> print new_i_size $28 = 0x108000 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize $29 = 0xd9000 gdb> down 2467 for (i = 0; i < idx; i++) gdb> print i $30 = 0xd44acbee This is 100% reproducible with some autonuma development code tuned in a very aggressive manner (not normal way even for knumad) which does "exotic" changes to the ptes. It wouldn't normally trigger but I don't see why it can't happen normally if the page is added to swap cache in between the two faults leading to "copied" being zero (which then hangs in ext4). So it should be fixed. Especially possible with lumpy reclaim (albeit disabled if compaction is enabled) as that would ignore the young bits in the ptes. Signed-off-by: Andrea Arcangeli <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Tim Gardner <[email protected]> Signed-off-by: Brad Figg <[email protected]>
…S block during isolation for migration BugLink: http://bugs.launchpad.net/bugs/931719 commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a torvalds#7 [d72d3d14] zone_watermark_ok at c02d26cb torvalds#8 [d72d3d2c] compact_zone at c030b8d torvalds#9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Tim Gardner <[email protected]>
- treat tailcall count as 32-bit for access and update - change out_offset scope from file to function - minor format/structure changes for consistency Testing: (skipping fentry, fexit, freplace) ======== root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls test_bpf: #0 Tail call leaf jited:1 967 PASS test_bpf: #1 Tail call 2 jited:1 1427 PASS test_bpf: #2 Tail call 3 jited:1 2373 PASS test_bpf: #3 Tail call 4 jited:1 2304 PASS test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS test_bpf: #5 Tail call load/store jited:1 2249 PASS test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31 397/1 tailcalls/tailcall_1:OK 397/2 tailcalls/tailcall_2:OK 397/3 tailcalls/tailcall_3:OK 397/4 tailcalls/tailcall_4:OK 397/5 tailcalls/tailcall_5:OK 397/6 tailcalls/tailcall_6:OK 397/7 tailcalls/tailcall_bpf2bpf_1:OK 397/8 tailcalls/tailcall_bpf2bpf_2:OK 397/9 tailcalls/tailcall_bpf2bpf_3:OK 397/10 tailcalls/tailcall_bpf2bpf_4:OK 397/11 tailcalls/tailcall_bpf2bpf_5:OK 397/12 tailcalls/tailcall_bpf2bpf_6:OK 397/17 tailcalls/tailcall_poke:OK 397/18 tailcalls/tailcall_bpf2bpf_hierarchy_1:OK 397/23 tailcalls/tailcall_bpf2bpf_hierarchy_2:OK 397/24 tailcalls/tailcall_bpf2bpf_hierarchy_3:OK 397/27 tailcalls/tailcall_failure:OK 397/28 tailcalls/reject_tail_call_spin_lock:OK 397/29 tailcalls/reject_tail_call_rcu_lock:OK 397/30 tailcalls/reject_tail_call_preempt_lock:OK 397/31 tailcalls/reject_tail_call_ref:OK 397 tailcalls:OK Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Tony Ambardar <[email protected]>
Update current JIT offsets to enable and set up BPF line info for better introspection and debugging. Offsets map each xlated insn to the start of its JITed code, as well as the epilogue. This also allows simplifying some code and dropping unneeded JIT ctx variables. Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly, before this change we see: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 While afterwards we have: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 ; struct seq_file *seq = ctx->meta->seq; 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] ; struct seq_file *seq = ctx->meta->seq; 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 ; struct udp_sock *udp_sk = ctx->udp_sk; 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 which aligns with the original source code. Signed-off-by: Tony Ambardar <[email protected]>
With test_progs compiled with ARM THUMB insns, trigger_func*() used for testing uprobes have odd (LSB set) entry points due to interworking. This fails alignment check first, then due to uprobe unsupported for THUMB in prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn(). First fix by setting trigger_func*() as ARM insns and rely on interworking enabled by default. Doesn't work on Debian bookworm target because insn #1 of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched by dynamic loader? Kernel somehow? __attribute__((target("arm"))) Works to compile all test_progs with gcc -marm, or just attach_probe.c. Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid using THUMB insns, with expected tests passing: torvalds#8/1 attach_probe/manual-default:OK torvalds#8/2 attach_probe/manual-legacy:OK torvalds#8/3 attach_probe/manual-perf:OK torvalds#8/4 attach_probe/manual-link:OK torvalds#8/5 attach_probe/auto:OK torvalds#8/6 attach_probe/kprobe-sleepable:OK torvalds#8/8 attach_probe/uprobe-sleepable:OK torvalds#8/9 attach_probe/uprobe-ref_ctr:OK torvalds#8/10 attach_probe/uprobe-long_name:OK torvalds#8/11 attach_probe/kprobe-long_name:OK torvalds#8 attach_probe:OK Note that attach_probe/uprobe-lib will still fail since it tries to attach a uprobe to the system libc.so, which is compiled for THUMB: test_attach_probe:PASS:skel_open 0 nsec test_attach_probe:PASS:skel_load 0 nsec test_attach_probe:PASS:check_bss 0 nsec test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22 torvalds#8/7 attach_probe/uprobe-lib:FAIL torvalds#8 attach_probe:FAIL Also compile other source files and binary with same issue using "-marm": bpf_cookie.c fill_link_info.c task_pt_regs.test.c uprobe_autoattach.c (also tries libc.so attach) usdt.c uprobe_multi Signed-off-by: Tony Ambardar <[email protected]>
- treat tailcall count as 32-bit for access and update - change out_offset scope from file to function - minor format/structure changes for consistency Testing: (skipping fentry, fexit, freplace) ======== root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls test_bpf: #0 Tail call leaf jited:1 967 PASS test_bpf: #1 Tail call 2 jited:1 1427 PASS test_bpf: #2 Tail call 3 jited:1 2373 PASS test_bpf: #3 Tail call 4 jited:1 2304 PASS test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS test_bpf: #5 Tail call load/store jited:1 2249 PASS test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31 397/1 tailcalls/tailcall_1:OK 397/2 tailcalls/tailcall_2:OK 397/3 tailcalls/tailcall_3:OK 397/4 tailcalls/tailcall_4:OK 397/5 tailcalls/tailcall_5:OK 397/6 tailcalls/tailcall_6:OK 397/7 tailcalls/tailcall_bpf2bpf_1:OK 397/8 tailcalls/tailcall_bpf2bpf_2:OK 397/9 tailcalls/tailcall_bpf2bpf_3:OK 397/10 tailcalls/tailcall_bpf2bpf_4:OK 397/11 tailcalls/tailcall_bpf2bpf_5:OK 397/12 tailcalls/tailcall_bpf2bpf_6:OK 397/17 tailcalls/tailcall_poke:OK 397/18 tailcalls/tailcall_bpf2bpf_hierarchy_1:OK 397/23 tailcalls/tailcall_bpf2bpf_hierarchy_2:OK 397/24 tailcalls/tailcall_bpf2bpf_hierarchy_3:OK 397/27 tailcalls/tailcall_failure:OK 397/28 tailcalls/reject_tail_call_spin_lock:OK 397/29 tailcalls/reject_tail_call_rcu_lock:OK 397/30 tailcalls/reject_tail_call_preempt_lock:OK 397/31 tailcalls/reject_tail_call_ref:OK 397 tailcalls:OK Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Tony Ambardar <[email protected]>
Update current JIT offsets to enable and set up BPF line info for better introspection and debugging. Offsets map each xlated insn to the start of its JITed code, as well as the epilogue. This also allows simplifying some code and dropping unneeded JIT ctx variables. Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly, before this change we see: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 While afterwards we have: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 ; struct seq_file *seq = ctx->meta->seq; 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] ; struct seq_file *seq = ctx->meta->seq; 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 ; struct udp_sock *udp_sk = ctx->udp_sk; 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 which aligns with the original source code. Signed-off-by: Tony Ambardar <[email protected]>
With test_progs compiled with ARM THUMB insns, trigger_func*() used for testing uprobes have odd (LSB set) entry points due to interworking. This fails alignment check first, then due to uprobe unsupported for THUMB in prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn(). First fix by setting trigger_func*() as ARM insns and rely on interworking enabled by default. Doesn't work on Debian bookworm target because insn #1 of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched by dynamic loader? Kernel somehow? __attribute__((target("arm"))) Works to compile all test_progs with gcc -marm, or just attach_probe.c. Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid using THUMB insns, with expected tests passing: torvalds#8/1 attach_probe/manual-default:OK torvalds#8/2 attach_probe/manual-legacy:OK torvalds#8/3 attach_probe/manual-perf:OK torvalds#8/4 attach_probe/manual-link:OK torvalds#8/5 attach_probe/auto:OK torvalds#8/6 attach_probe/kprobe-sleepable:OK torvalds#8/8 attach_probe/uprobe-sleepable:OK torvalds#8/9 attach_probe/uprobe-ref_ctr:OK torvalds#8/10 attach_probe/uprobe-long_name:OK torvalds#8/11 attach_probe/kprobe-long_name:OK torvalds#8 attach_probe:OK Note that attach_probe/uprobe-lib will still fail since it tries to attach a uprobe to the system libc.so, which is compiled for THUMB: test_attach_probe:PASS:skel_open 0 nsec test_attach_probe:PASS:skel_load 0 nsec test_attach_probe:PASS:check_bss 0 nsec test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22 torvalds#8/7 attach_probe/uprobe-lib:FAIL torvalds#8 attach_probe:FAIL Also compile other source files and binary with same issue using "-marm": bpf_cookie.c fill_link_info.c task_pt_regs.test.c uprobe_autoattach.c (also tries libc.so attach) usdt.c uprobe_multi Signed-off-by: Tony Ambardar <[email protected]>
- treat tailcall count as 32-bit for access and update - change out_offset scope from file to function - minor format/structure changes for consistency Testing: (skipping fentry, fexit, freplace) ======== root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls test_bpf: #0 Tail call leaf jited:1 967 PASS test_bpf: #1 Tail call 2 jited:1 1427 PASS test_bpf: #2 Tail call 3 jited:1 2373 PASS test_bpf: #3 Tail call 4 jited:1 2304 PASS test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS test_bpf: #5 Tail call load/store jited:1 2249 PASS test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31 397/1 tailcalls/tailcall_1:OK 397/2 tailcalls/tailcall_2:OK 397/3 tailcalls/tailcall_3:OK 397/4 tailcalls/tailcall_4:OK 397/5 tailcalls/tailcall_5:OK 397/6 tailcalls/tailcall_6:OK 397/7 tailcalls/tailcall_bpf2bpf_1:OK 397/8 tailcalls/tailcall_bpf2bpf_2:OK 397/9 tailcalls/tailcall_bpf2bpf_3:OK 397/10 tailcalls/tailcall_bpf2bpf_4:OK 397/11 tailcalls/tailcall_bpf2bpf_5:OK 397/12 tailcalls/tailcall_bpf2bpf_6:OK 397/17 tailcalls/tailcall_poke:OK 397/18 tailcalls/tailcall_bpf2bpf_hierarchy_1:OK 397/23 tailcalls/tailcall_bpf2bpf_hierarchy_2:OK 397/24 tailcalls/tailcall_bpf2bpf_hierarchy_3:OK 397/27 tailcalls/tailcall_failure:OK 397/28 tailcalls/reject_tail_call_spin_lock:OK 397/29 tailcalls/reject_tail_call_rcu_lock:OK 397/30 tailcalls/reject_tail_call_preempt_lock:OK 397/31 tailcalls/reject_tail_call_ref:OK 397 tailcalls:OK Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Tony Ambardar <[email protected]>
Update current JIT offsets to enable and set up BPF line info for better introspection and debugging. Offsets map each xlated insn to the start of its JITed code, as well as the epilogue. This also allows simplifying some code and dropping unneeded JIT ctx variables. Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly, before this change we see: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 While afterwards we have: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 ; struct seq_file *seq = ctx->meta->seq; 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] ; struct seq_file *seq = ctx->meta->seq; 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 ; struct udp_sock *udp_sk = ctx->udp_sk; 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 which aligns with the original source code. Signed-off-by: Tony Ambardar <[email protected]>
With test_progs compiled with ARM THUMB insns, trigger_func*() used for testing uprobes have odd (LSB set) entry points due to interworking. This fails alignment check first, then due to uprobe unsupported for THUMB in prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn(). First fix by setting trigger_func*() as ARM insns and rely on interworking enabled by default. Doesn't work on Debian bookworm target because insn #1 of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched by dynamic loader? Kernel somehow? __attribute__((target("arm"))) Works to compile all test_progs with gcc -marm, or just attach_probe.c. Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid using THUMB insns, with expected tests passing: torvalds#8/1 attach_probe/manual-default:OK torvalds#8/2 attach_probe/manual-legacy:OK torvalds#8/3 attach_probe/manual-perf:OK torvalds#8/4 attach_probe/manual-link:OK torvalds#8/5 attach_probe/auto:OK torvalds#8/6 attach_probe/kprobe-sleepable:OK torvalds#8/8 attach_probe/uprobe-sleepable:OK torvalds#8/9 attach_probe/uprobe-ref_ctr:OK torvalds#8/10 attach_probe/uprobe-long_name:OK torvalds#8/11 attach_probe/kprobe-long_name:OK torvalds#8 attach_probe:OK Note that attach_probe/uprobe-lib will still fail since it tries to attach a uprobe to the system libc.so, which is compiled for THUMB: test_attach_probe:PASS:skel_open 0 nsec test_attach_probe:PASS:skel_load 0 nsec test_attach_probe:PASS:check_bss 0 nsec test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22 torvalds#8/7 attach_probe/uprobe-lib:FAIL torvalds#8 attach_probe:FAIL Also compile other source files and binary with same issue using "-marm": bpf_cookie.c fill_link_info.c task_pt_regs.test.c uprobe_autoattach.c (also tries libc.so attach) usdt.c uprobe_multi Signed-off-by: Tony Ambardar <[email protected]>
- treat tailcall count as 32-bit for access and update - change out_offset scope from file to function - minor format/structure changes for consistency Testing: (skipping fentry, fexit, freplace) ======== root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls test_bpf: #0 Tail call leaf jited:1 967 PASS test_bpf: #1 Tail call 2 jited:1 1427 PASS test_bpf: #2 Tail call 3 jited:1 2373 PASS test_bpf: #3 Tail call 4 jited:1 2304 PASS test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS test_bpf: #5 Tail call load/store jited:1 2249 PASS test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31 397/1 tailcalls/tailcall_1:OK 397/2 tailcalls/tailcall_2:OK 397/3 tailcalls/tailcall_3:OK 397/4 tailcalls/tailcall_4:OK 397/5 tailcalls/tailcall_5:OK 397/6 tailcalls/tailcall_6:OK 397/7 tailcalls/tailcall_bpf2bpf_1:OK 397/8 tailcalls/tailcall_bpf2bpf_2:OK 397/9 tailcalls/tailcall_bpf2bpf_3:OK 397/10 tailcalls/tailcall_bpf2bpf_4:OK 397/11 tailcalls/tailcall_bpf2bpf_5:OK 397/12 tailcalls/tailcall_bpf2bpf_6:OK 397/17 tailcalls/tailcall_poke:OK 397/18 tailcalls/tailcall_bpf2bpf_hierarchy_1:OK 397/23 tailcalls/tailcall_bpf2bpf_hierarchy_2:OK 397/24 tailcalls/tailcall_bpf2bpf_hierarchy_3:OK 397/27 tailcalls/tailcall_failure:OK 397/28 tailcalls/reject_tail_call_spin_lock:OK 397/29 tailcalls/reject_tail_call_rcu_lock:OK 397/30 tailcalls/reject_tail_call_preempt_lock:OK 397/31 tailcalls/reject_tail_call_ref:OK 397 tailcalls:OK Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Tony Ambardar <[email protected]>
Update current JIT offsets to enable and set up BPF line info for better introspection and debugging. Offsets map each xlated insn to the start of its JITed code, as well as the epilogue. This also allows simplifying some code and dropping unneeded JIT ctx variables. Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly, before this change we see: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 While afterwards we have: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 ; struct seq_file *seq = ctx->meta->seq; 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] ; struct seq_file *seq = ctx->meta->seq; 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 ; struct udp_sock *udp_sk = ctx->udp_sk; 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 which aligns with the original source code. Signed-off-by: Tony Ambardar <[email protected]>
With test_progs compiled with ARM THUMB insns, trigger_func*() used for testing uprobes have odd (LSB set) entry points due to interworking. This fails alignment check first, then due to uprobe unsupported for THUMB in prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn(). First fix by setting trigger_func*() as ARM insns and rely on interworking enabled by default. Doesn't work on Debian bookworm target because insn #1 of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched by dynamic loader? Kernel somehow? __attribute__((target("arm"))) Works to compile all test_progs with gcc -marm, or just attach_probe.c. Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid using THUMB insns, with expected tests passing: torvalds#8/1 attach_probe/manual-default:OK torvalds#8/2 attach_probe/manual-legacy:OK torvalds#8/3 attach_probe/manual-perf:OK torvalds#8/4 attach_probe/manual-link:OK torvalds#8/5 attach_probe/auto:OK torvalds#8/6 attach_probe/kprobe-sleepable:OK torvalds#8/8 attach_probe/uprobe-sleepable:OK torvalds#8/9 attach_probe/uprobe-ref_ctr:OK torvalds#8/10 attach_probe/uprobe-long_name:OK torvalds#8/11 attach_probe/kprobe-long_name:OK torvalds#8 attach_probe:OK Note that attach_probe/uprobe-lib will still fail since it tries to attach a uprobe to the system libc.so, which is compiled for THUMB: test_attach_probe:PASS:skel_open 0 nsec test_attach_probe:PASS:skel_load 0 nsec test_attach_probe:PASS:check_bss 0 nsec test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22 torvalds#8/7 attach_probe/uprobe-lib:FAIL torvalds#8 attach_probe:FAIL Also compile other source files and binary with same issue using "-marm": bpf_cookie.c fill_link_info.c task_pt_regs.test.c uprobe_autoattach.c (also tries libc.so attach) usdt.c uprobe_multi Signed-off-by: Tony Ambardar <[email protected]>
- treat tailcall count as 32-bit for access and update - change out_offset scope from file to function - minor format/structure changes for consistency Testing: (skipping fentry, fexit, freplace) ======== root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls test_bpf: #0 Tail call leaf jited:1 967 PASS test_bpf: #1 Tail call 2 jited:1 1427 PASS test_bpf: #2 Tail call 3 jited:1 2373 PASS test_bpf: #3 Tail call 4 jited:1 2304 PASS test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS test_bpf: #5 Tail call load/store jited:1 2249 PASS test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31 397/1 tailcalls/tailcall_1:OK 397/2 tailcalls/tailcall_2:OK 397/3 tailcalls/tailcall_3:OK 397/4 tailcalls/tailcall_4:OK 397/5 tailcalls/tailcall_5:OK 397/6 tailcalls/tailcall_6:OK 397/7 tailcalls/tailcall_bpf2bpf_1:OK 397/8 tailcalls/tailcall_bpf2bpf_2:OK 397/9 tailcalls/tailcall_bpf2bpf_3:OK 397/10 tailcalls/tailcall_bpf2bpf_4:OK 397/11 tailcalls/tailcall_bpf2bpf_5:OK 397/12 tailcalls/tailcall_bpf2bpf_6:OK 397/17 tailcalls/tailcall_poke:OK 397/18 tailcalls/tailcall_bpf2bpf_hierarchy_1:OK 397/23 tailcalls/tailcall_bpf2bpf_hierarchy_2:OK 397/24 tailcalls/tailcall_bpf2bpf_hierarchy_3:OK 397/27 tailcalls/tailcall_failure:OK 397/28 tailcalls/reject_tail_call_spin_lock:OK 397/29 tailcalls/reject_tail_call_rcu_lock:OK 397/30 tailcalls/reject_tail_call_preempt_lock:OK 397/31 tailcalls/reject_tail_call_ref:OK 397 tailcalls:OK Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Tony Ambardar <[email protected]>
Update current JIT offsets to enable and set up BPF line info for better introspection and debugging. Offsets map each xlated insn to the start of its JITed code, as well as the epilogue. This also allows simplifying some code and dropping unneeded JIT ctx variables. Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly, before this change we see: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 While afterwards we have: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 ; struct seq_file *seq = ctx->meta->seq; 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] ; struct seq_file *seq = ctx->meta->seq; 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 ; struct udp_sock *udp_sk = ctx->udp_sk; 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 which aligns with the original source code. Signed-off-by: Tony Ambardar <[email protected]>
With test_progs compiled with ARM THUMB insns, trigger_func*() used for testing uprobes have odd (LSB set) entry points due to interworking. This fails alignment check first, then due to uprobe unsupported for THUMB in prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn(). First fix by setting trigger_func*() as ARM insns and rely on interworking enabled by default. Doesn't work on Debian bookworm target because insn #1 of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched by dynamic loader? Kernel somehow? __attribute__((target("arm"))) Works to compile all test_progs with gcc -marm, or just attach_probe.c. Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid using THUMB insns, with expected tests passing: torvalds#8/1 attach_probe/manual-default:OK torvalds#8/2 attach_probe/manual-legacy:OK torvalds#8/3 attach_probe/manual-perf:OK torvalds#8/4 attach_probe/manual-link:OK torvalds#8/5 attach_probe/auto:OK torvalds#8/6 attach_probe/kprobe-sleepable:OK torvalds#8/8 attach_probe/uprobe-sleepable:OK torvalds#8/9 attach_probe/uprobe-ref_ctr:OK torvalds#8/10 attach_probe/uprobe-long_name:OK torvalds#8/11 attach_probe/kprobe-long_name:OK torvalds#8 attach_probe:OK Note that attach_probe/uprobe-lib will still fail since it tries to attach a uprobe to the system libc.so, which is compiled for THUMB: test_attach_probe:PASS:skel_open 0 nsec test_attach_probe:PASS:skel_load 0 nsec test_attach_probe:PASS:check_bss 0 nsec test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22 torvalds#8/7 attach_probe/uprobe-lib:FAIL torvalds#8 attach_probe:FAIL Also compile other source files and binary with same issue using "-marm": bpf_cookie.c fill_link_info.c task_pt_regs.test.c uprobe_autoattach.c (also tries libc.so attach) usdt.c uprobe_multi Signed-off-by: Tony Ambardar <[email protected]>
Add a driver for the Unicam camera receiver block on BCM283x processors. Compared to the bcm2835-camera driver present in staging, this driver handles the Unicam block only (CSI-2 receiver), and doesn't depend on the VC4 firmware running on the VPU. The commit is made up of a series of changes cherry-picked from the rpi-5.4.y branch of https://github.com/raspberrypi/linux/ with additional enhancements, forward-ported to the mainline kernel. Signed-off-by: Dave Stevenson <[email protected]> Signed-off-by: Naushir Patuck <[email protected]> Signed-off-by: Laurent Pinchart <[email protected]> Reported-by: kbuild test robot <[email protected]> media: bcm2835-unicam: Add support for get_mbus_config to set num lanes Use the get_mbus_config pad subdev call to allow a source to use fewer than the number of CSI2 lanes defined in device tree. Signed-off-by: Dave Stevenson <[email protected]> media: bcm2835-unicam: Avoid gcc warning over {0} on endpoint Older gcc versions object to = { 0 } initialisation if the first elemtn in the structure is a substructure. Use = { } to avoid this compiler warning. Signed-off-by: Dave Stevenson <[email protected]> media: bcm2835-unicam: Reinstate V4L2_CAP_READWRITE in the caps v4l2-compliance throws a failure if the device doesn't advertise V4L2_CAP_READWRITE but allows read or write operations. We do support read, so reinstate the flag. Signed-off-by: Dave Stevenson <[email protected]> media: bcm2835-unicam: Ensure type is VIDEO_CAPTURE in [g|s]_selection [g|s]_selection pass in a buffer type that needs to be validated before passing on to the sensor subdev. Signed-off-by: Dave Stevenson <[email protected]> media: bcm2835: unicam: Set VPU min clock freq to 250Mhz. When streaming with Unicam, the VPU must have a clock frequency of at least 250Mhz. Otherwise, the input fifos could overrun, causing image corruption. Signed-off-by: Naushir Patuck <[email protected]> media: bcm2835-unicam: Drop WARN on uing direct cache alias Pi 0&1 pass all ARM accesses through the VPU L2 cache, therefore the dma-ranges property sets the cache alias bits to other than the direct alias, hence this WARN was firing. It was overprotective coding, so assume that everything is OK with the dma-ranges, and remove the WARN. Signed-off-by: Dave Stevenson <[email protected]> media: bcm2835-unicam: Always service interrupts From when bringing up the driver, there was a check in the isr to ignore interrupts (claiming them handled) should the driver not be streaming. The VPU now will not register a camera driver if it finds a CSI2 node enabled in device tree, therefore this flawed check is redundant. raspberrypi/linux#3602 Signed-off-by: Dave Stevenson <[email protected]> media: bcm2835: unicam: Fix uninitialized warning Signed-off-by: Jacko Dirks <[email protected]> media: bcm2835-unicam: Fixup review comments from Hans. Updates the driver based on the upstream review comments from Hans Verkuil at https://patchwork.linuxtv.org/patch/63531/ Signed-off-by: Dave Stevenson <[email protected]> media: bcm2835-unicam: Retain packing information on G_FMT The change to retrieve the pixel format always on g_fmt didn't check whether the native or unpacked version of the format had been requested, and always returned the packed one. Correct this so that the packing setting is retained whereever possible. Fixes "9d59e89 media: bcm2835-unicam: Re-fetch mbus code from subdev on a g_fmt call" Signed-off-by: Dave Stevenson <[email protected]> media: bcm2835-unicam: change minimum number of vb2_queue buffers to 1 Since the unicam driver was modified to write to a dummy buffer when no user-supplied buffer is available, it can now write to and return a buffer even when there's only a single one. Enable this by changing the min_buffers_needed in the vb2_queue; it will be useful for enabling still captures without allocating more memory than absolutely necessary. Signed-off-by: David Plowman <[email protected]> staging: vc04_services: ISP: Add a more complex ISP processing component Driver for the BCM2835 ISP hardware block. This driver uses the MMAL component to program the ISP hardware through the VC firmware. The ISP component can produce two video stream outputs, and Bayer image statistics. This can't be encompassed in a simple V4L2 M2M device, so create a new device that registers 4 video nodes. This patch squashes all the development patches from the earlier rpi-5.4.y branch into one Signed-off-by: Naushir Patuck <[email protected]> staging/bcm2835-isp: Add the unpacked (16bpp) raw formats Now that the firmware supports the unpacked (16bpp) variants of the MIPI raw formats, add the mappings. Signed-off-by: Dave Stevenson <[email protected]> staging/bcm2835-isp: Log the number of excess supported formats When logging that the firmware has provided more supported formats than we had allocated storage for, log the number allocated and returned. Signed-off-by: Dave Stevenson <[email protected]> staging: vc04_services: ISP: Add colour denoise control Add colour denoise control to the bcm2835 driver through a new v4l2 control: V4L2_CID_USER_BCM2835_ISP_CDN. Add the accompanying MMAL configuration structure definitions as well. Signed-off-by: Naushir Patuck <[email protected]> bcm2835-isp: Allow formats with different colour spaces. Each supported format now includes a mask showing the allowed colour spaces, as well as a default colour space for when one was not specified. Additionally we translate the colour space to mmal format and pass it over to the VideoCore. Signed-off-by: David Plowman <[email protected]> media: i2c: add ov9281 driver. Change-Id: I7b77250bbc56d2f861450cf77271ad15f9b88ab1 Signed-off-by: Zefa Chen <[email protected]> media: i2c: ov9281: fix mclk issue when probe multiple camera. Takes the ov9281 part only from the Rockchip's patch. Change-Id: I30e833baf2c1bb07d6d87ddb3b00759ab45a90e4 Signed-off-by: Zefa Chen <[email protected]> media: i2c: ov9281: add enum_frame_interval function for iq tool 2.2 and hal3 Adds the ov9281 parts of the Rockchip patch adding enum_frame_interval to a large number of drivers. Change-Id: I03344cd6cf278dd7c18fce8e97479089ef185a5c Signed-off-by: Zefa Chen <[email protected]> media: i2c: ov9281: Fixup for recent kernel releases, and remove custom code The Rockchip driver was based on a 4.4 kernel, and had several custom Rockchip parts. Update to 5.4 kernel APIs, with the relevant controls required by libcamera, and remove custom Rockchip parts. Signed-off-by: Dave Stevenson <[email protected]> media: i2c: ov9281: Read chip ID via 2 reads Vision Components have made an OV9281 module which blocks reading back the majority of registers to comply with NDAs, and in doing so doesn't allow auto-increment register reading as used when reading the chip ID. Use two reads and manually combine the results. Signed-off-by: Dave Stevenson <[email protected]> media: i2c: ov9281: Add support for 8 bit readout The sensor supports 8 bit mode as well as 10bit, so add the relevant code to allow selection of this. Signed-off-by: Dave Stevenson <[email protected]> media: ov9281: Add 1280x720 and 640x480 modes Breaks out common register set and adds the different registers for 1280x720 (cropped) and 640x480 (skipped) modes Signed-off-by: Dave Stevenson <[email protected]> Fixed picture line bug in all ov9281 modes Signed-off-by: Mathias Anhalt <[email protected]> Added hflip and vflip controls to ov9281 Signed-off-by: Mathias Anhalt <[email protected]> media: i2c: ov9281: Remove override of subdev name From the original Rockchip driver, the subdev was renamed from the default to being "mov9281 <dev_name>" whereas the default would have been "ov9281 <dev_name>". Remove the override to drop back to the default rather than a vendor custom string. Signed-off-by: Dave Stevenson <[email protected]> media: v4l2-subdev: add subdev-wide state struct Signed-off-by: Dom Cobley <[email protected]> media: i2c: ov9281: Add fwnode properties controls Add call to v4l2_ctrl_new_fwnode_properties to read and create the fwnode based controls. Signed-off-by: Dave Stevenson <[email protected]> media: i2c: ov9281: Sensor should report RAW color space Tested on Raspberry Pi running libcamera. Signed-off-by: David Plowman <[email protected]> Partial revert "media: i2c: add ov9281 driver." This partially reverts commit 84e98e3a4f3eecb168ceb80231c3e8252929892e. The commit had merged some changes to other drivers with adding the ov9281 driver. Only the ov9281 parts have been reverted. staging/bcm2835-isp: Fix compiler warning The result of dividing a u32 by a size_t is an unsigned int on arm32 and a long unsigned int on arm64. Use "%zu" (the size_t format) to remove the build warning for 64-bit builds. Signed-off-by: Phil Elwell <[email protected]> staging: vc04_services: isp: Set the YUV420/YVU420 format stride to 64 bytes The bcm2835 ISP requires the base address of all input/output planes to have 32 byte alignment. Using a Y stride of 32 bytes would not guarantee that the V plane would fulfil this, e.g. a height of 650 lines would mean the V plane buffer is not 32 byte aligned for YUV420 formats. Having a Y stride of 64 bytes would ensure both U and V planes have a 32 byte alignment, as the luma height will always be an even number of lines. Signed-off-by: Naushir Patuck <[email protected]> vc04_services: isp: Report input node as wanting full range RAW color space RAW color spaces are more usually reported as having full range quantization. Tested using libcamera. Signed-off-by: David Plowman <[email protected]> drivers: bcm2835_isp: Allow multiple users for the ISP driver. Add a second (identical) set of device nodes to allow concurrent use of the ISP hardware by another user. This change effectively creates a second state structure (struct bcm2835_isp_dev) to maintain independent state for the second user. Node and media entity names are appened with the instance index appropriately. Further users can be added by changing the BCM2835_ISP_NUM_INSTANCES define. Signed-off-by: Naushir Patuck <[email protected]> drivers: bcm2835_isp: Fix div by 0 bug. Fix a possible division by 0 bug when setting up the mmal port for the stats port. Signed-off-by: Naushir Patuck <[email protected]> staging/bcm2835-isp: Fix cleanup after init fail bcm2835_isp_remove is called on an initialisation failure, but at that point the drvdata hasn't been set. This causes a crash when e.g. using the cutdown firmware (gpu_mem=16). Move platform_set_drvdata before the instance probing loop to avoid the problem. See: raspberrypi/linux#4774 Signed-off-by: Phil Elwell <[email protected]> bcm2835-v4l2-isp: Add missing lock initialization ISP device allocation is dynamic hence the locks too. struct mutex queue_lock is not initialized which result in bug. Fixing same by initializing it. [ 29.847138] INFO: trying to register non-static key. [ 29.847156] The code is fine but needs lockdep annotation, or maybe [ 29.847159] you didn't initialize this object before use? [ 29.847161] turning off the locking correctness validator. [ 29.847167] CPU: 1 PID: 343 Comm: v4l_id Tainted: G C 5.15.11-rt24-v8+ torvalds#8 [ 29.847187] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT) [ 29.847194] Call trace: [ 29.847197] dump_backtrace+0x0/0x1b8 [ 29.847227] show_stack+0x20/0x30 [ 29.847240] dump_stack_lvl+0x8c/0xb8 [ 29.847254] dump_stack+0x18/0x34 [ 29.847263] register_lock_class+0x494/0x4a0 [ 29.847278] __lock_acquire+0x80/0x1680 [ 29.847289] lock_acquire+0x214/0x3a0 [ 29.847300] mutex_lock_nested+0x70/0xc8 [ 29.847312] _vb2_fop_release+0x3c/0xa8 [videobuf2_v4l2] [ 29.847346] vb2_fop_release+0x34/0x60 [videobuf2_v4l2] [ 29.847367] v4l2_release+0xc8/0x108 [videodev] [ 29.847453] __fput+0x8c/0x258 [ 29.847476] ____fput+0x18/0x28 [ 29.847487] task_work_run+0x98/0x180 [ 29.847502] do_notify_resume+0x228/0x3f8 [ 29.847515] el0_svc+0xec/0xf0 [ 29.847523] el0t_64_sync_handler+0x90/0xb8 [ 29.847531] el0t_64_sync+0x180/0x184 Signed-off-by: Padmanabha Srinivasaiah <[email protected]> staging: vc04_services: isp: Permit all sRGB colour spaces on ISP outputs ISP outputs actually support all colour spaces that are fundamentally sRGB underneath, regardless of whether an RGB or YUV output format is actually requested. Signed-off-by: David Plowman <[email protected]> drivers: staging: bcm2835-isp: Do not cleanup mmal vcsm buffer on stop_streaming On stop_streaming() the vcsm buffer handle gets released by the buffer cleanup code. This will subsequently cause and error if userland re-queues the same buffer on the next start_streaming() call. Remove this cleanup code and rely on the vb2_ops->buf_cleanup() call to do the cleanups instead. Signed-off-by: Naushir Patuck <[email protected]> drivers: staging: bcm2835-isp: Clear LS table handle in the firmware When all nodes have stopped streaming, ensure the firmware has released its handle on the LS table dmabuf. This is done by passing a null handle in the LS params. Signed-off-by: Naushir Patuck <[email protected]> drivers: staging: bcm2835-isp: Respect caller's stride value The stride value reported for output image buffers should be at least as large as any value that was passed in by the caller (subject to correct alignment for the pixel format). If the value is zero (meaning no value was passed), or is too small, the minimum acceptable value will be substituted. Signed-off-by: David Plowman <[email protected]> staging: vc04_services: bcm2835-isp: Drop include Makefile directive Drop the include directive. They can break the build, when one only wants to build a subdirectory. Replace with "../" for the includes in the bcm2835-isp instead. The fix is equivalent to the four patches between 29d49a7 ("staging: vc04_services: bcm2835-audio: Drop include Makefile directive")...2529ca2 ("staging: vc04_services: interface: Drop include Makefile directive") Fixes: c8f89c9551c1 ("staging: vc04_services: ISP: Add a more complex ISP processing component") Suggested-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kieran Bingham <[email protected]> staging: vc04_services: bcm2835-v4l2-isp: Register with vchiq_bus_type Register the bcm2835-v4l2-isp driver with the vchiq_bus_type instead of using the platform driver/device. Signed-off-by: Kieran Bingham <[email protected]> staging: vc04_services: bcm2835-v4l2-isp: Explicitly set DMA mask The platform model originally handled the DMA mask. Now that we are on the vchiq_bus we need to explicitly set this. Signed-off-by: Kieran Bingham <[email protected]> drivers: media: bcm2835_isp: Cache LS table dmabuf Clients such as libcamera do not change the LS table dmabuf on every frame. In such cases instead of mapping/remapping the same dmabuf on every frame to send to the firmware, cache the dmabuf once and only update and remap if the dmabuf has been changed by the userland client. Signed-off-by: Naushir Patuck <[email protected]> media: bcm2835-unicam: Correctly handle error propagation for stream on On a failure in start_streaming(), the error code would not propagate to the calling function on all conditions. This would cause the userland caller to not know of the failure. Signed-off-by: Naushir Patuck <[email protected]> media: bcm2835-unicam: Return early from stop_streaming() if stopped clk_disable_unprepare() is called unconditionally in stop_streaming(). This is incorrect in the cases where start_streaming() fails, and unprepares all clocks as part of the failure cleanup. To avoid this, ensure that clk_disable_unprepare() is only called in stop_streaming() if the clocks are in a prepared state. Signed-off-by: Naushir Patuck <[email protected]> media: bcm2835-unicam: Clear clock state when stopping streaming Commit 65e08c465020d4c5b51afb452efc2246d80fd66f failed to clear the clock state when the device stopped streaming. Fix this, as it might again cause the same problems when doing an unprepare. Signed-off-by: Naushir Patuck <[email protected]> media: bcm2835-unicam: Fix bug in buffer swapping logic If multiple sets of interrupts occur simultaneously, it may be unsafe to swap buffers, as the hardware may already be re-using the current buffers. In such cases, avoid swapping buffers, and wait for the next opportunity at the Frame End interrupt to signal completion. Additionally, check the packet compare status when watching for frame end for buffers swaps, as this could also signify a frame end event. Signed-off-by: Naushir Patuck <[email protected]> media: bcm2835-unicam: Forward input status from subdevice The vidioc_enum_input() v4l2 ioctl is capable of returning sensor/input status as well. This is used in current GStreamer HEAD for signal detection [1]. bcm2835-unicam does handle this syscall, but it didn't ask the subdevice driver about the input status. The input then appeared as always present. This commit adds the necessary query. There is a precedent for this - the R-Car VIN V4L2 driver does a similar call [2]. [1]: https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/-/blob/ce0be27caf69aa9d96b73bc2b50737451b6f6936/sys/v4l2/gstv4l2src.c#L553 [2]: https://github.com/raspberrypi/linux/blob/7fb9d006d3ff3baf2e205e0c85c4e4fd0a64fcd0/drivers/media/platform/rcar-vin/rcar-v4l2.c#L548 Signed-off-by: Jakub Vaněk <[email protected]> media/bcm2835-unicam: Parse pad numbers correctly The driver was making big assumptions about the source device using pad 0 and 1, which doesn't follow for more complex devices where Unicam's source device may be a sink device for something else. Read the pad numbers through media controller, and reference them appropriately. Signed-off-by: Dave Stevenson <[email protected]> media/bcm2835-unicam: Add support for configuration via MC API Adds Media Controller API support for more complex pipelines. libcamera is about to switch to using this mechanism for configuring sensors. This can be enabled by either a module parameter, or device tree. Various functions have been moved to group video-centric and mc-centric functions together. Based on a similar conversion done to ti-vpe. Signed-off-by: Dave Stevenson <[email protected]> media: bcm2835-unicam: Fixup for 5.18 and new get_mbus_config struct The number of active CSI2 data lanes has moved within the struct v4l2_mbus_config used by the get_mbus_config API call. Update the driver to match the changes in mainline. Signed-off-by: Dave Stevenson <[email protected]> drivers: bcm2835_unicam: Add logging message when a frame is dropped. If a dummy buffer is still active on a frame start, it indicates that this frame will be dropped. The explicit logging helps users identify performance issues. Signed-off-by: Naushir Patuck <[email protected]> drivers: bcm2835_unicam: Disable trigger mode operation On a Pi3 B/B+ platform the imx219 sensor frequently generates a single corrupt frame when the sensor first starts. This can either be a missing line, or invalid samples within the line. This only occurrs using the Unicam kernel driver. Disabling trigger mode elimiates this corruption. Since trigger mode is a legacy feature copied from the firmware driver and not expected to be needed, remove it. Tested on the Raspberry Pi cameras and shows no ill effects. Signed-off-by: Naushir Patuck <[email protected]> media: bcm2835-unicam: Set ret on error path in unicam_async_complete() Clang warns: drivers/media/platform/bcm2835/bcm2835-unicam.c:3109:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] if (!source_pads) { ^~~~~~~~~~~~ drivers/media/platform/bcm2835/bcm2835-unicam.c:3152:9: note: uninitialized use occurs here return ret; ^~~ drivers/media/platform/bcm2835/bcm2835-unicam.c:3109:2: note: remove the 'if' if its condition is always false if (!source_pads) { ^~~~~~~~~~~~~~~~~~~ drivers/media/platform/bcm2835/bcm2835-unicam.c:3091:9: note: initialize the variable 'ret' to silence this warning int ret; ^ = 0 1 warning generated. When the if condition is true, ret will be used uninitialized, which could result in undesirable behavior. Set ret to -ENODEV on the error path, which is a standard error code for the ->complete() callback. Fixes: d056e86eb35f ("media/bcm2835-unicam: Parse pad numbers correctly") Signed-off-by: Nathan Chancellor <[email protected]> media: bcm2835-unicam: Handle a repeated frame start with no end In the case of 2 frame starts being received with no frame end between, the queued buffer held in next_frm was lost as the pointer was overwritten with the dummy buffer. Signed-off-by: Dave Stevenson <[email protected]> media: bcm2835-unicam: Correctly handle FS + FE ISR condtion If we get a simultaneous FS + FE interrupt for the same frame, it cannot be marked as completed and returned to userland as the framebuffer will be refilled by Unicam on the next sensor frame. Additionally, the timestamp will be set to 0 as the FS interrupt handling code will not have run yet. To avoid these problems, the frame is considered dropped in the FE handler, and will be returned to userland on the subsequent sensor frame. Signed-off-by: Naushir Patuck <[email protected]> media: bcm2835-unicam: Fix for possible dummy buffer overrun The Unicam hardware has been observed to cause a buffer overrun when using the dummy buffer as a circular buffer. The conditions that cause the overrun are not fully known, but it seems to occur when the memory bus is heavily loaded. To avoid the overrun, program the hardware with a buffer size of 0 when using the dummy buffer. This will cause overrun into the allocated dummy buffer, but avoid out of bounds writes. Signed-off-by: Naushir Patuck <[email protected]> media: bcm2835-unicam: Fix up start/stop api change Signed-off-by: Dom Cobley <[email protected]> media: bcm2835-unicam: Use mipi-csi2.h header for data type values The MIPI CSI2 data type ID values are now defined in the mipi-csi2.h header, so use those defines instead of hard coding them in the driver. Signed-off-by: Dave Stevenson <[email protected]> media: bcm2835-unicam: Add support for RAW16 formats With the RAW16 formats now having a defined CSI2 data type ID, they can be added to the driver. Signed-off-by: Dave Stevenson <[email protected]> media: bcm2835-unicam: Start and stop media_pipeline with same node media_pipeline_start and media_pipeline_stop now validate that the pipeline is being started and stopped with the same pipe and pad handles. When running with embedded metadata (eg imx477 and imx708), the start typically happens from the metadata pad, whilst stop is always from the image pad. Always pass the image pad to media_pipeline_start to ensure that the calls are balanced. Signed-off-by: Dave Stevenson <[email protected]> drivers: media: bcm2835_unicam: Improve frame sequence count handling Ensure that the frame sequence counter is incremented only if a previous frame start interrupt has occurred, or a frame start + frame end has occurred simultaneously. This corresponds the sequence number with the actual number of frames produced by the sensor, not the number of frame buffers dequeued back to userland. Signed-off-by: Naushir Patuck <[email protected]> bcm2835-unicam: hacks to allow it to build media: bcm2835-unicam: Fix up async notifier usage Fixes "8a090fc3e549 bcm2835-unicam: hacks to allow it to build" media: bcm2835-unicam: Add option for a GPIO to reflect FS/FE timing The legacy stack had an option to have a GPIO track frame start and end events to give basic synchronisation to the incoming image stream. https://forums.raspberrypi.com/viewtopic.php?t=190314 Replicate this in the kernel Unicam driver. Signed-off-by: Dave Stevenson <[email protected]> media: bcm2835-unicam: Add support for 12bit mono packed format Now that V4L2_PIX_FMT_Y12P is defined, allow passing raw 12bit mono packed data through the peripheral. Signed-off-by: Dave Stevenson <[email protected]> media: bcm2835-unicam: Add support for 14bit mono sources Now that V4L2_PIX_FMT_Y14 and V4L2_PIX_FMT_Y14P are defined, allow passing 14bit mono data through the peripheral. Signed-off-by: Dave Stevenson <[email protected]> media: bcm2835-unicam: Add support for unpacked 14bit Bayer formats Now that the 14bit non-packed Bayer formats are defined, add them into the supported formats lookup table. Signed-off-by: Dave Stevenson <[email protected]> media: bcm2835-unicam: Reinstate old downstream driver as legacy Whilst the Unicam driver has now been upstreamed it only supports configuration via Media Controller (not driven from the /dev/videoN node), which makes life significantly harder for simple devices such as mono sensors, and HDMI or analogue video to CSI2 bridge chips (eg TC358743 and ADV7282M). Fix up the downstream driver so that it builds, reinstate the links from Kconfig and Makefile to it, and give it a new Kconfig name (VIDEO_BCM2835_UNICAM_LEGACY). Signed-off-by: Dave Stevenson <[email protected]>
Driver for the BCM2835 ISP hardware block. This driver uses the MMAL component to program the ISP hardware through the VC firmware. The ISP component can produce two video stream outputs, and Bayer image statistics. This can't be encompassed in a simple V4L2 M2M device, so create a new device that registers 4 video nodes. This patch squashes all the development patches from the earlier rpi-5.4.y branch into one Signed-off-by: Naushir Patuck <[email protected]> staging/bcm2835-isp: Add the unpacked (16bpp) raw formats Now that the firmware supports the unpacked (16bpp) variants of the MIPI raw formats, add the mappings. Signed-off-by: Dave Stevenson <[email protected]> staging/bcm2835-isp: Log the number of excess supported formats When logging that the firmware has provided more supported formats than we had allocated storage for, log the number allocated and returned. Signed-off-by: Dave Stevenson <[email protected]> staging: vc04_services: ISP: Add colour denoise control Add colour denoise control to the bcm2835 driver through a new v4l2 control: V4L2_CID_USER_BCM2835_ISP_CDN. Add the accompanying MMAL configuration structure definitions as well. Signed-off-by: Naushir Patuck <[email protected]> bcm2835-isp: Allow formats with different colour spaces. Each supported format now includes a mask showing the allowed colour spaces, as well as a default colour space for when one was not specified. Additionally we translate the colour space to mmal format and pass it over to the VideoCore. Signed-off-by: David Plowman <[email protected]> media: i2c: add ov9281 driver. Change-Id: I7b77250bbc56d2f861450cf77271ad15f9b88ab1 Signed-off-by: Zefa Chen <[email protected]> media: i2c: ov9281: fix mclk issue when probe multiple camera. Takes the ov9281 part only from the Rockchip's patch. Change-Id: I30e833baf2c1bb07d6d87ddb3b00759ab45a90e4 Signed-off-by: Zefa Chen <[email protected]> media: i2c: ov9281: add enum_frame_interval function for iq tool 2.2 and hal3 Adds the ov9281 parts of the Rockchip patch adding enum_frame_interval to a large number of drivers. Change-Id: I03344cd6cf278dd7c18fce8e97479089ef185a5c Signed-off-by: Zefa Chen <[email protected]> media: i2c: ov9281: Fixup for recent kernel releases, and remove custom code The Rockchip driver was based on a 4.4 kernel, and had several custom Rockchip parts. Update to 5.4 kernel APIs, with the relevant controls required by libcamera, and remove custom Rockchip parts. Signed-off-by: Dave Stevenson <[email protected]> media: i2c: ov9281: Read chip ID via 2 reads Vision Components have made an OV9281 module which blocks reading back the majority of registers to comply with NDAs, and in doing so doesn't allow auto-increment register reading as used when reading the chip ID. Use two reads and manually combine the results. Signed-off-by: Dave Stevenson <[email protected]> media: i2c: ov9281: Add support for 8 bit readout The sensor supports 8 bit mode as well as 10bit, so add the relevant code to allow selection of this. Signed-off-by: Dave Stevenson <[email protected]> media: ov9281: Add 1280x720 and 640x480 modes Breaks out common register set and adds the different registers for 1280x720 (cropped) and 640x480 (skipped) modes Signed-off-by: Dave Stevenson <[email protected]> Fixed picture line bug in all ov9281 modes Signed-off-by: Mathias Anhalt <[email protected]> Added hflip and vflip controls to ov9281 Signed-off-by: Mathias Anhalt <[email protected]> media: i2c: ov9281: Remove override of subdev name From the original Rockchip driver, the subdev was renamed from the default to being "mov9281 <dev_name>" whereas the default would have been "ov9281 <dev_name>". Remove the override to drop back to the default rather than a vendor custom string. Signed-off-by: Dave Stevenson <[email protected]> media: v4l2-subdev: add subdev-wide state struct Signed-off-by: Dom Cobley <[email protected]> media: i2c: ov9281: Add fwnode properties controls Add call to v4l2_ctrl_new_fwnode_properties to read and create the fwnode based controls. Signed-off-by: Dave Stevenson <[email protected]> media: i2c: ov9281: Sensor should report RAW color space Tested on Raspberry Pi running libcamera. Signed-off-by: David Plowman <[email protected]> Partial revert "media: i2c: add ov9281 driver." This partially reverts commit 84e98e3a4f3eecb168ceb80231c3e8252929892e. The commit had merged some changes to other drivers with adding the ov9281 driver. Only the ov9281 parts have been reverted. staging/bcm2835-isp: Fix compiler warning The result of dividing a u32 by a size_t is an unsigned int on arm32 and a long unsigned int on arm64. Use "%zu" (the size_t format) to remove the build warning for 64-bit builds. Signed-off-by: Phil Elwell <[email protected]> staging: vc04_services: isp: Set the YUV420/YVU420 format stride to 64 bytes The bcm2835 ISP requires the base address of all input/output planes to have 32 byte alignment. Using a Y stride of 32 bytes would not guarantee that the V plane would fulfil this, e.g. a height of 650 lines would mean the V plane buffer is not 32 byte aligned for YUV420 formats. Having a Y stride of 64 bytes would ensure both U and V planes have a 32 byte alignment, as the luma height will always be an even number of lines. Signed-off-by: Naushir Patuck <[email protected]> vc04_services: isp: Report input node as wanting full range RAW color space RAW color spaces are more usually reported as having full range quantization. Tested using libcamera. Signed-off-by: David Plowman <[email protected]> drivers: bcm2835_isp: Allow multiple users for the ISP driver. Add a second (identical) set of device nodes to allow concurrent use of the ISP hardware by another user. This change effectively creates a second state structure (struct bcm2835_isp_dev) to maintain independent state for the second user. Node and media entity names are appened with the instance index appropriately. Further users can be added by changing the BCM2835_ISP_NUM_INSTANCES define. Signed-off-by: Naushir Patuck <[email protected]> drivers: bcm2835_isp: Fix div by 0 bug. Fix a possible division by 0 bug when setting up the mmal port for the stats port. Signed-off-by: Naushir Patuck <[email protected]> staging/bcm2835-isp: Fix cleanup after init fail bcm2835_isp_remove is called on an initialisation failure, but at that point the drvdata hasn't been set. This causes a crash when e.g. using the cutdown firmware (gpu_mem=16). Move platform_set_drvdata before the instance probing loop to avoid the problem. See: raspberrypi/linux#4774 Signed-off-by: Phil Elwell <[email protected]> bcm2835-v4l2-isp: Add missing lock initialization ISP device allocation is dynamic hence the locks too. struct mutex queue_lock is not initialized which result in bug. Fixing same by initializing it. [ 29.847138] INFO: trying to register non-static key. [ 29.847156] The code is fine but needs lockdep annotation, or maybe [ 29.847159] you didn't initialize this object before use? [ 29.847161] turning off the locking correctness validator. [ 29.847167] CPU: 1 PID: 343 Comm: v4l_id Tainted: G C 5.15.11-rt24-v8+ torvalds#8 [ 29.847187] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT) [ 29.847194] Call trace: [ 29.847197] dump_backtrace+0x0/0x1b8 [ 29.847227] show_stack+0x20/0x30 [ 29.847240] dump_stack_lvl+0x8c/0xb8 [ 29.847254] dump_stack+0x18/0x34 [ 29.847263] register_lock_class+0x494/0x4a0 [ 29.847278] __lock_acquire+0x80/0x1680 [ 29.847289] lock_acquire+0x214/0x3a0 [ 29.847300] mutex_lock_nested+0x70/0xc8 [ 29.847312] _vb2_fop_release+0x3c/0xa8 [videobuf2_v4l2] [ 29.847346] vb2_fop_release+0x34/0x60 [videobuf2_v4l2] [ 29.847367] v4l2_release+0xc8/0x108 [videodev] [ 29.847453] __fput+0x8c/0x258 [ 29.847476] ____fput+0x18/0x28 [ 29.847487] task_work_run+0x98/0x180 [ 29.847502] do_notify_resume+0x228/0x3f8 [ 29.847515] el0_svc+0xec/0xf0 [ 29.847523] el0t_64_sync_handler+0x90/0xb8 [ 29.847531] el0t_64_sync+0x180/0x184 Signed-off-by: Padmanabha Srinivasaiah <[email protected]> staging: vc04_services: isp: Permit all sRGB colour spaces on ISP outputs ISP outputs actually support all colour spaces that are fundamentally sRGB underneath, regardless of whether an RGB or YUV output format is actually requested. Signed-off-by: David Plowman <[email protected]> drivers: staging: bcm2835-isp: Do not cleanup mmal vcsm buffer on stop_streaming On stop_streaming() the vcsm buffer handle gets released by the buffer cleanup code. This will subsequently cause and error if userland re-queues the same buffer on the next start_streaming() call. Remove this cleanup code and rely on the vb2_ops->buf_cleanup() call to do the cleanups instead. Signed-off-by: Naushir Patuck <[email protected]> drivers: staging: bcm2835-isp: Clear LS table handle in the firmware When all nodes have stopped streaming, ensure the firmware has released its handle on the LS table dmabuf. This is done by passing a null handle in the LS params. Signed-off-by: Naushir Patuck <[email protected]> drivers: staging: bcm2835-isp: Respect caller's stride value The stride value reported for output image buffers should be at least as large as any value that was passed in by the caller (subject to correct alignment for the pixel format). If the value is zero (meaning no value was passed), or is too small, the minimum acceptable value will be substituted. Signed-off-by: David Plowman <[email protected]> staging: vc04_services: bcm2835-isp: Drop include Makefile directive Drop the include directive. They can break the build, when one only wants to build a subdirectory. Replace with "../" for the includes in the bcm2835-isp instead. The fix is equivalent to the four patches between 29d49a7 ("staging: vc04_services: bcm2835-audio: Drop include Makefile directive")...2529ca2 ("staging: vc04_services: interface: Drop include Makefile directive") Fixes: c8f89c9551c1 ("staging: vc04_services: ISP: Add a more complex ISP processing component") Suggested-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kieran Bingham <[email protected]> staging: vc04_services: bcm2835-v4l2-isp: Register with vchiq_bus_type Register the bcm2835-v4l2-isp driver with the vchiq_bus_type instead of using the platform driver/device. Signed-off-by: Kieran Bingham <[email protected]> staging: vc04_services: bcm2835-v4l2-isp: Explicitly set DMA mask The platform model originally handled the DMA mask. Now that we are on the vchiq_bus we need to explicitly set this. Signed-off-by: Kieran Bingham <[email protected]> drivers: media: bcm2835_isp: Cache LS table dmabuf Clients such as libcamera do not change the LS table dmabuf on every frame. In such cases instead of mapping/remapping the same dmabuf on every frame to send to the firmware, cache the dmabuf once and only update and remap if the dmabuf has been changed by the userland client. Signed-off-by: Naushir Patuck <[email protected]>
- treat tailcall count as 32-bit for access and update - change out_offset scope from file to function - minor format/structure changes for consistency Testing: (skipping fentry, fexit, freplace) ======== root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls test_bpf: #0 Tail call leaf jited:1 967 PASS test_bpf: #1 Tail call 2 jited:1 1427 PASS test_bpf: #2 Tail call 3 jited:1 2373 PASS test_bpf: #3 Tail call 4 jited:1 2304 PASS test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS test_bpf: #5 Tail call load/store jited:1 2249 PASS test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31 397/1 tailcalls/tailcall_1:OK 397/2 tailcalls/tailcall_2:OK 397/3 tailcalls/tailcall_3:OK 397/4 tailcalls/tailcall_4:OK 397/5 tailcalls/tailcall_5:OK 397/6 tailcalls/tailcall_6:OK 397/7 tailcalls/tailcall_bpf2bpf_1:OK 397/8 tailcalls/tailcall_bpf2bpf_2:OK 397/9 tailcalls/tailcall_bpf2bpf_3:OK 397/10 tailcalls/tailcall_bpf2bpf_4:OK 397/11 tailcalls/tailcall_bpf2bpf_5:OK 397/12 tailcalls/tailcall_bpf2bpf_6:OK 397/17 tailcalls/tailcall_poke:OK 397/18 tailcalls/tailcall_bpf2bpf_hierarchy_1:OK 397/23 tailcalls/tailcall_bpf2bpf_hierarchy_2:OK 397/24 tailcalls/tailcall_bpf2bpf_hierarchy_3:OK 397/27 tailcalls/tailcall_failure:OK 397/28 tailcalls/reject_tail_call_spin_lock:OK 397/29 tailcalls/reject_tail_call_rcu_lock:OK 397/30 tailcalls/reject_tail_call_preempt_lock:OK 397/31 tailcalls/reject_tail_call_ref:OK 397 tailcalls:OK Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Tony Ambardar <[email protected]>
Update current JIT offsets to enable and set up BPF line info for better introspection and debugging. Offsets map each xlated insn to the start of its JITed code, as well as the epilogue. This also allows simplifying some code and dropping unneeded JIT ctx variables. Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly, before this change we see: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 While afterwards we have: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 ; struct seq_file *seq = ctx->meta->seq; 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] ; struct seq_file *seq = ctx->meta->seq; 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 ; struct udp_sock *udp_sk = ctx->udp_sk; 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 which aligns with the original source code. Signed-off-by: Tony Ambardar <[email protected]>
With test_progs compiled with ARM THUMB insns, trigger_func*() used for testing uprobes have odd (LSB set) entry points due to interworking. This fails alignment check first, then due to uprobe unsupported for THUMB in prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn(). First fix by setting trigger_func*() as ARM insns and rely on interworking enabled by default. Doesn't work on Debian bookworm target because insn #1 of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched by dynamic loader? Kernel somehow? __attribute__((target("arm"))) Works to compile all test_progs with gcc -marm, or just attach_probe.c. Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid using THUMB insns, with expected tests passing: torvalds#8/1 attach_probe/manual-default:OK torvalds#8/2 attach_probe/manual-legacy:OK torvalds#8/3 attach_probe/manual-perf:OK torvalds#8/4 attach_probe/manual-link:OK torvalds#8/5 attach_probe/auto:OK torvalds#8/6 attach_probe/kprobe-sleepable:OK torvalds#8/8 attach_probe/uprobe-sleepable:OK torvalds#8/9 attach_probe/uprobe-ref_ctr:OK torvalds#8/10 attach_probe/uprobe-long_name:OK torvalds#8/11 attach_probe/kprobe-long_name:OK torvalds#8 attach_probe:OK Note that attach_probe/uprobe-lib will still fail since it tries to attach a uprobe to the system libc.so, which is compiled for THUMB: test_attach_probe:PASS:skel_open 0 nsec test_attach_probe:PASS:skel_load 0 nsec test_attach_probe:PASS:check_bss 0 nsec test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22 torvalds#8/7 attach_probe/uprobe-lib:FAIL torvalds#8 attach_probe:FAIL Also compile other source files and binary with same issue using "-marm": bpf_cookie.c fill_link_info.c task_pt_regs.test.c uprobe_autoattach.c (also tries libc.so attach) usdt.c uprobe_multi Signed-off-by: Tony Ambardar <[email protected]>
- treat tailcall count as 32-bit for access and update - change out_offset scope from file to function - minor format/structure changes for consistency Testing: (skipping fentry, fexit, freplace) ======== root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls test_bpf: #0 Tail call leaf jited:1 967 PASS test_bpf: #1 Tail call 2 jited:1 1427 PASS test_bpf: #2 Tail call 3 jited:1 2373 PASS test_bpf: #3 Tail call 4 jited:1 2304 PASS test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS test_bpf: #5 Tail call load/store jited:1 2249 PASS test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31 397/1 tailcalls/tailcall_1:OK 397/2 tailcalls/tailcall_2:OK 397/3 tailcalls/tailcall_3:OK 397/4 tailcalls/tailcall_4:OK 397/5 tailcalls/tailcall_5:OK 397/6 tailcalls/tailcall_6:OK 397/7 tailcalls/tailcall_bpf2bpf_1:OK 397/8 tailcalls/tailcall_bpf2bpf_2:OK 397/9 tailcalls/tailcall_bpf2bpf_3:OK 397/10 tailcalls/tailcall_bpf2bpf_4:OK 397/11 tailcalls/tailcall_bpf2bpf_5:OK 397/12 tailcalls/tailcall_bpf2bpf_6:OK 397/17 tailcalls/tailcall_poke:OK 397/18 tailcalls/tailcall_bpf2bpf_hierarchy_1:OK 397/23 tailcalls/tailcall_bpf2bpf_hierarchy_2:OK 397/24 tailcalls/tailcall_bpf2bpf_hierarchy_3:OK 397/27 tailcalls/tailcall_failure:OK 397/28 tailcalls/reject_tail_call_spin_lock:OK 397/29 tailcalls/reject_tail_call_rcu_lock:OK 397/30 tailcalls/reject_tail_call_preempt_lock:OK 397/31 tailcalls/reject_tail_call_ref:OK 397 tailcalls:OK Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Tony Ambardar <[email protected]>
Update current JIT offsets to enable and set up BPF line info for better introspection and debugging. Offsets map each xlated insn to the start of its JITed code, as well as the epilogue. This also allows simplifying some code and dropping unneeded JIT ctx variables. Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly, before this change we see: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 While afterwards we have: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 ; struct seq_file *seq = ctx->meta->seq; 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] ; struct seq_file *seq = ctx->meta->seq; 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 ; struct udp_sock *udp_sk = ctx->udp_sk; 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 which aligns with the original source code. Signed-off-by: Tony Ambardar <[email protected]>
With test_progs compiled with ARM THUMB insns, trigger_func*() used for testing uprobes have odd (LSB set) entry points due to interworking. This fails alignment check first, then due to uprobe unsupported for THUMB in prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn(). First fix by setting trigger_func*() as ARM insns and rely on interworking enabled by default. Doesn't work on Debian bookworm target because insn #1 of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched by dynamic loader? Kernel somehow? __attribute__((target("arm"))) Works to compile all test_progs with gcc -marm, or just attach_probe.c. Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid using THUMB insns, with expected tests passing: torvalds#8/1 attach_probe/manual-default:OK torvalds#8/2 attach_probe/manual-legacy:OK torvalds#8/3 attach_probe/manual-perf:OK torvalds#8/4 attach_probe/manual-link:OK torvalds#8/5 attach_probe/auto:OK torvalds#8/6 attach_probe/kprobe-sleepable:OK torvalds#8/8 attach_probe/uprobe-sleepable:OK torvalds#8/9 attach_probe/uprobe-ref_ctr:OK torvalds#8/10 attach_probe/uprobe-long_name:OK torvalds#8/11 attach_probe/kprobe-long_name:OK torvalds#8 attach_probe:OK Note that attach_probe/uprobe-lib will still fail since it tries to attach a uprobe to the system libc.so, which is compiled for THUMB: test_attach_probe:PASS:skel_open 0 nsec test_attach_probe:PASS:skel_load 0 nsec test_attach_probe:PASS:check_bss 0 nsec test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22 torvalds#8/7 attach_probe/uprobe-lib:FAIL torvalds#8 attach_probe:FAIL Also compile other source files and binary with same issue using "-marm": bpf_cookie.c fill_link_info.c task_pt_regs.test.c uprobe_autoattach.c (also tries libc.so attach) usdt.c uprobe_multi Signed-off-by: Tony Ambardar <[email protected]>
Petr Machata says: ==================== selftests: Mark auto-deferring functions clearly selftests/net/lib.sh contains a suite of iproute2 wrappers that automatically schedule the corresponding cleanup through defer. The fact they do so is however not immediately obvious, one needs to know which functions are handling the deferral behind the scenes, and which expect the caller to handle cleanups themselves. A convention for these auto-deferring functions would help both writing and patch review. This patchset does so by marking these functions with an adf_ prefix. We already have a few such functions: forwarding/lib.sh has adf_mcd_start() and a few selftests add private helpers that conform to this convention. Patches #1 to torvalds#8 gradually convert individual functions, one per patch. Patch torvalds#9 renames an auto-deferring private helpers named dfr_* to adf_*. The plan is not to retro-rename all private helpers, but I happened to know about this one. Patches torvalds#10 to torvalds#12 introduce several autodefer helpers for commonly used forwarding/lib.sh functions, and opportunistically convert straightforward instances of 'action; defer counteraction' to the new helpers. Patch torvalds#13 adds some README verbiage to pitch defer and the adf_* convention. ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
- treat tailcall count as 32-bit for access and update - change out_offset scope from file to function - minor format/structure changes for consistency Testing: (skipping fentry, fexit, freplace) ======== root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls test_bpf: #0 Tail call leaf jited:1 967 PASS test_bpf: #1 Tail call 2 jited:1 1427 PASS test_bpf: #2 Tail call 3 jited:1 2373 PASS test_bpf: #3 Tail call 4 jited:1 2304 PASS test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS test_bpf: #5 Tail call load/store jited:1 2249 PASS test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31 397/1 tailcalls/tailcall_1:OK 397/2 tailcalls/tailcall_2:OK 397/3 tailcalls/tailcall_3:OK 397/4 tailcalls/tailcall_4:OK 397/5 tailcalls/tailcall_5:OK 397/6 tailcalls/tailcall_6:OK 397/7 tailcalls/tailcall_bpf2bpf_1:OK 397/8 tailcalls/tailcall_bpf2bpf_2:OK 397/9 tailcalls/tailcall_bpf2bpf_3:OK 397/10 tailcalls/tailcall_bpf2bpf_4:OK 397/11 tailcalls/tailcall_bpf2bpf_5:OK 397/12 tailcalls/tailcall_bpf2bpf_6:OK 397/17 tailcalls/tailcall_poke:OK 397/18 tailcalls/tailcall_bpf2bpf_hierarchy_1:OK 397/23 tailcalls/tailcall_bpf2bpf_hierarchy_2:OK 397/24 tailcalls/tailcall_bpf2bpf_hierarchy_3:OK 397/27 tailcalls/tailcall_failure:OK 397/28 tailcalls/reject_tail_call_spin_lock:OK 397/29 tailcalls/reject_tail_call_rcu_lock:OK 397/30 tailcalls/reject_tail_call_preempt_lock:OK 397/31 tailcalls/reject_tail_call_ref:OK 397 tailcalls:OK Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Tony Ambardar <[email protected]>
Update current JIT offsets to enable and set up BPF line info for better introspection and debugging. Offsets map each xlated insn to the start of its JITed code, as well as the epilogue. This also allows simplifying some code and dropping unneeded JIT ctx variables. Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly, before this change we see: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 While afterwards we have: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 ; struct seq_file *seq = ctx->meta->seq; 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] ; struct seq_file *seq = ctx->meta->seq; 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 ; struct udp_sock *udp_sk = ctx->udp_sk; 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 which aligns with the original source code. Signed-off-by: Tony Ambardar <[email protected]>
With test_progs compiled with ARM THUMB insns, trigger_func*() used for testing uprobes have odd (LSB set) entry points due to interworking. This fails alignment check first, then due to uprobe unsupported for THUMB in prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn(). First fix by setting trigger_func*() as ARM insns and rely on interworking enabled by default. Doesn't work on Debian bookworm target because insn #1 of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched by dynamic loader? Kernel somehow? __attribute__((target("arm"))) Works to compile all test_progs with gcc -marm, or just attach_probe.c. Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid using THUMB insns, with expected tests passing: torvalds#8/1 attach_probe/manual-default:OK torvalds#8/2 attach_probe/manual-legacy:OK torvalds#8/3 attach_probe/manual-perf:OK torvalds#8/4 attach_probe/manual-link:OK torvalds#8/5 attach_probe/auto:OK torvalds#8/6 attach_probe/kprobe-sleepable:OK torvalds#8/8 attach_probe/uprobe-sleepable:OK torvalds#8/9 attach_probe/uprobe-ref_ctr:OK torvalds#8/10 attach_probe/uprobe-long_name:OK torvalds#8/11 attach_probe/kprobe-long_name:OK torvalds#8 attach_probe:OK Note that attach_probe/uprobe-lib will still fail since it tries to attach a uprobe to the system libc.so, which is compiled for THUMB: test_attach_probe:PASS:skel_open 0 nsec test_attach_probe:PASS:skel_load 0 nsec test_attach_probe:PASS:check_bss 0 nsec test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22 torvalds#8/7 attach_probe/uprobe-lib:FAIL torvalds#8 attach_probe:FAIL Also compile other source files and binary with same issue using "-marm": bpf_cookie.c fill_link_info.c task_pt_regs.test.c uprobe_autoattach.c (also tries libc.so attach) usdt.c uprobe_multi Signed-off-by: Tony Ambardar <[email protected]>
- treat tailcall count as 32-bit for access and update - change out_offset scope from file to function - minor format/structure changes for consistency Testing: (skipping fentry, fexit, freplace) ======== root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls test_bpf: #0 Tail call leaf jited:1 967 PASS test_bpf: #1 Tail call 2 jited:1 1427 PASS test_bpf: #2 Tail call 3 jited:1 2373 PASS test_bpf: #3 Tail call 4 jited:1 2304 PASS test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS test_bpf: #5 Tail call load/store jited:1 2249 PASS test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31 397/1 tailcalls/tailcall_1:OK 397/2 tailcalls/tailcall_2:OK 397/3 tailcalls/tailcall_3:OK 397/4 tailcalls/tailcall_4:OK 397/5 tailcalls/tailcall_5:OK 397/6 tailcalls/tailcall_6:OK 397/7 tailcalls/tailcall_bpf2bpf_1:OK 397/8 tailcalls/tailcall_bpf2bpf_2:OK 397/9 tailcalls/tailcall_bpf2bpf_3:OK 397/10 tailcalls/tailcall_bpf2bpf_4:OK 397/11 tailcalls/tailcall_bpf2bpf_5:OK 397/12 tailcalls/tailcall_bpf2bpf_6:OK 397/17 tailcalls/tailcall_poke:OK 397/18 tailcalls/tailcall_bpf2bpf_hierarchy_1:OK 397/23 tailcalls/tailcall_bpf2bpf_hierarchy_2:OK 397/24 tailcalls/tailcall_bpf2bpf_hierarchy_3:OK 397/27 tailcalls/tailcall_failure:OK 397/28 tailcalls/reject_tail_call_spin_lock:OK 397/29 tailcalls/reject_tail_call_rcu_lock:OK 397/30 tailcalls/reject_tail_call_preempt_lock:OK 397/31 tailcalls/reject_tail_call_ref:OK 397 tailcalls:OK Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Tony Ambardar <[email protected]>
Update current JIT offsets to enable and set up BPF line info for better introspection and debugging. Offsets map each xlated insn to the start of its JITed code, as well as the epilogue. This also allows simplifying some code and dropping unneeded JIT ctx variables. Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly, before this change we see: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 While afterwards we have: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 ; struct seq_file *seq = ctx->meta->seq; 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] ; struct seq_file *seq = ctx->meta->seq; 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 ; struct udp_sock *udp_sk = ctx->udp_sk; 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 which aligns with the original source code. Signed-off-by: Tony Ambardar <[email protected]>
With test_progs compiled with ARM THUMB insns, trigger_func*() used for testing uprobes have odd (LSB set) entry points due to interworking. This fails alignment check first, then due to uprobe unsupported for THUMB in prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn(). First fix by setting trigger_func*() as ARM insns and rely on interworking enabled by default. Doesn't work on Debian bookworm target because insn #1 of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched by dynamic loader? Kernel somehow? __attribute__((target("arm"))) Works to compile all test_progs with gcc -marm, or just attach_probe.c. Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid using THUMB insns, with expected tests passing: torvalds#8/1 attach_probe/manual-default:OK torvalds#8/2 attach_probe/manual-legacy:OK torvalds#8/3 attach_probe/manual-perf:OK torvalds#8/4 attach_probe/manual-link:OK torvalds#8/5 attach_probe/auto:OK torvalds#8/6 attach_probe/kprobe-sleepable:OK torvalds#8/8 attach_probe/uprobe-sleepable:OK torvalds#8/9 attach_probe/uprobe-ref_ctr:OK torvalds#8/10 attach_probe/uprobe-long_name:OK torvalds#8/11 attach_probe/kprobe-long_name:OK torvalds#8 attach_probe:OK Note that attach_probe/uprobe-lib will still fail since it tries to attach a uprobe to the system libc.so, which is compiled for THUMB: test_attach_probe:PASS:skel_open 0 nsec test_attach_probe:PASS:skel_load 0 nsec test_attach_probe:PASS:check_bss 0 nsec test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22 torvalds#8/7 attach_probe/uprobe-lib:FAIL torvalds#8 attach_probe:FAIL Also compile other source files and binary with same issue using "-marm": bpf_cookie.c fill_link_info.c task_pt_regs.test.c uprobe_autoattach.c (also tries libc.so attach) usdt.c uprobe_multi Signed-off-by: Tony Ambardar <[email protected]>
I was going through the readme, checking it out, and a few things kept distracting me. So I fixed 'em.