Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

Spaceghost
Copy link

I was going through the readme, checking it out, and a few things kept distracting me. So I fixed 'em.

@bdonlan
Copy link
Contributor

bdonlan commented Sep 6, 2011

Please note that pull requests are not the proper procedure to submit patches to the Linux kernel (Linus put the kernel up here because kernel.org's master mirror is down; it seems that he doesn't like the pull request system[1], but github does not allow him to disable it). Please read Documentation/SubmittingPatches - you must write a proper commit message (actually describing what changed, not just 'tinkered with'), add a Signed-Off-By line, and submit to the linux kernel mailing list.

[1] - http://blueparen.com/node/12

@Spaceghost
Copy link
Author

That's a bit too much work for the usual github stuff. Perhaps I'll just leave it alone and let the usual kernel.org hackers help out.

@Spaceghost Spaceghost closed this Sep 6, 2011
jankeromnes pushed a commit to jankeromnes/linux that referenced this pull request Nov 5, 2011
The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ torvalds#8 Bochs Bochs
 RIP: 0010:[<ffffffff810d0879>]  [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [<ffffffff810d2845>] lock_acquire+0x95/0x140
  [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
  [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70
  [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
  [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
  [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0
  [<ffffffff811c40df>] blkdev_put+0x5f/0x190
  [<ffffffff8118f18d>] kill_block_super+0x4d/0x80
  [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
  [<ffffffff8119003a>] deactivate_super+0x4a/0x70
  [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
  [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0
  [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: [email protected]
Signed-off-by: Jens Axboe <[email protected]>
iksaif pushed a commit to iksaif/platform-drivers-x86 that referenced this pull request Nov 6, 2011
This patch validates sdev pointer in scsi_dh_activate before proceeding further.

Without this check we might see the panic as below. I have seen this
panic multiple times..

Call trace:

 #0 [ffff88007d647b50] machine_kexec at ffffffff81020902
 #1 [ffff88007d647ba0] crash_kexec at ffffffff810875b0
 #2 [ffff88007d647c70] oops_end at ffffffff8139c650
 #3 [ffff88007d647c90] __bad_area_nosemaphore at ffffffff8102dd15
 #4 [ffff88007d647d50] page_fault at ffffffff8139b8cf
    [exception RIP: scsi_dh_activate+0x82]
    RIP: ffffffffa0041922  RSP: ffff88007d647e00  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 00000000000093c5
    RDX: 00000000000093c5  RSI: ffffffffa02e6640  RDI: ffff88007cc88988
    RBP: 000000000000000f   R8: ffff88007d646000   R9: 0000000000000000
    R10: ffff880082293790  R11: 00000000ffffffff  R12: ffff88007cc88988
    R13: 0000000000000000  R14: 0000000000000286  R15: ffff880037b845e0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #5 [ffff88007d647e38] run_workqueue at ffffffff81060268
 torvalds#6 [ffff88007d647e78] worker_thread at ffffffff81060386
 torvalds#7 [ffff88007d647ee8] kthread at ffffffff81064436
 torvalds#8 [ffff88007d647f48] kernel_thread at ffffffff81003fba

Signed-off-by: Babu Moger <[email protected]>
Cc: [email protected]
Signed-off-by: James Bottomley <[email protected]>
fengguang pushed a commit to fengguang/linux that referenced this pull request Nov 7, 2011
The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ torvalds#8 Bochs Bochs
 RIP: 0010:[<ffffffff810d0879>]  [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [<ffffffff810d2845>] lock_acquire+0x95/0x140
  [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
  [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70
  [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
  [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
  [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0
  [<ffffffff811c40df>] blkdev_put+0x5f/0x190
  [<ffffffff8118f18d>] kill_block_super+0x4d/0x80
  [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
  [<ffffffff8119003a>] deactivate_super+0x4a/0x70
  [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
  [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0
  [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo <[email protected]>
Cc: [email protected]
Signed-off-by: Jens Axboe <[email protected]>
baerwolf pushed a commit to baerwolf/linux-stephan that referenced this pull request Nov 12, 2011
commit a18a920 upstream.

This patch validates sdev pointer in scsi_dh_activate before proceeding further.

Without this check we might see the panic as below. I have seen this
panic multiple times..

Call trace:

 #0 [ffff88007d647b50] machine_kexec at ffffffff81020902
 #1 [ffff88007d647ba0] crash_kexec at ffffffff810875b0
 #2 [ffff88007d647c70] oops_end at ffffffff8139c650
 #3 [ffff88007d647c90] __bad_area_nosemaphore at ffffffff8102dd15
 #4 [ffff88007d647d50] page_fault at ffffffff8139b8cf
    [exception RIP: scsi_dh_activate+0x82]
    RIP: ffffffffa0041922  RSP: ffff88007d647e00  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 00000000000093c5
    RDX: 00000000000093c5  RSI: ffffffffa02e6640  RDI: ffff88007cc88988
    RBP: 000000000000000f   R8: ffff88007d646000   R9: 0000000000000000
    R10: ffff880082293790  R11: 00000000ffffffff  R12: ffff88007cc88988
    R13: 0000000000000000  R14: 0000000000000286  R15: ffff880037b845e0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #5 [ffff88007d647e38] run_workqueue at ffffffff81060268
 torvalds#6 [ffff88007d647e78] worker_thread at ffffffff81060386
 torvalds#7 [ffff88007d647ee8] kthread at ffffffff81064436
 torvalds#8 [ffff88007d647f48] kernel_thread at ffffffff81003fba

Signed-off-by: Babu Moger <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
baerwolf pushed a commit to baerwolf/linux-stephan that referenced this pull request Nov 12, 2011
commit f992ae8 upstream.

The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ torvalds#8 Bochs Bochs
 RIP: 0010:[<ffffffff810d0879>]  [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [<ffffffff810d2845>] lock_acquire+0x95/0x140
  [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
  [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70
  [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
  [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
  [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0
  [<ffffffff811c40df>] blkdev_put+0x5f/0x190
  [<ffffffff8118f18d>] kill_block_super+0x4d/0x80
  [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
  [<ffffffff8119003a>] deactivate_super+0x4a/0x70
  [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
  [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0
  [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
torvalds pushed a commit that referenced this pull request Dec 15, 2011
If the pte mapping in generic_perform_write() is unmapped between
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
"copied" parameter to ->end_write can be zero. ext4 couldn't cope with
it with delayed allocations enabled. This skips the i_disksize
enlargement logic if copied is zero and no new data was appeneded to
the inode.

 gdb> bt
 #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
 ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
 #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
 os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
 #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
 xffff88001e26be40) at mm/filemap.c:2600
 #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
 zed out>, pos=<value optimized out>) at mm/filemap.c:2632
 #6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
 t fs/ext4/file.c:136
 #7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
 ppos=0xffff88001e26bf48) at fs/read_write.c:406
 #8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
 000, pos=0xffff88001e26bf48) at fs/read_write.c:435
 #9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
 4000) at fs/read_write.c:487
 #10 <signal handler called>
 #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
 #12 0x0000000000000000 in ?? ()
 gdb> print offset
 $22 = 0xffffffffffffffff
 gdb> print idx
 $23 = 0xffffffff
 gdb> print inode->i_blkbits
 $24 = 0xc
 gdb> up
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 2512                    if (ext4_da_should_update_i_disksize(page, end)) {
 gdb> print start
 $25 = 0x0
 gdb> print end
 $26 = 0xffffffffffffffff
 gdb> print pos
 $27 = 0x108000
 gdb> print new_i_size
 $28 = 0x108000
 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
 $29 = 0xd9000
 gdb> down
 2467            for (i = 0; i < idx; i++)
 gdb> print i
 $30 = 0xd44acbee

This is 100% reproducible with some autonuma development code tuned in
a very aggressive manner (not normal way even for knumad) which does
"exotic" changes to the ptes. It wouldn't normally trigger but I don't
see why it can't happen normally if the page is added to swap cache in
between the two faults leading to "copied" being zero (which then
hangs in ext4). So it should be fixed. Especially possible with lumpy
reclaim (albeit disabled if compaction is enabled) as that would
ignore the young bits in the ptes.

Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Cc: [email protected]
torvalds pushed a commit that referenced this pull request Jan 11, 2012
Nothing requires that we lock the filesystem until the root inode is
provided.

Also iget5_locked() triggers a warning because we are holding the
filesystem lock while allocating the inode, which result in a lockdep
suspicion that we have a lock inversion against the reclaim path:

[ 1986.896979] =================================
[ 1986.896990] [ INFO: inconsistent lock state ]
[ 1986.896997] 3.1.1-main #8
[ 1986.897001] ---------------------------------
[ 1986.897007] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[ 1986.897016] kswapd0/16 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 1986.897023]  (&REISERFS_SB(s)->lock){+.+.?.}, at: [<c01f8bd4>] reiserfs_write_lock+0x20/0x2a
[ 1986.897044] {RECLAIM_FS-ON-W} state was registered at:
[ 1986.897050]   [<c014a5b9>] mark_held_locks+0xae/0xd0
[ 1986.897060]   [<c014aab3>] lockdep_trace_alloc+0x7d/0x91
[ 1986.897068]   [<c0190ee0>] kmem_cache_alloc+0x1a/0x93
[ 1986.897078]   [<c01e7728>] reiserfs_alloc_inode+0x13/0x3d
[ 1986.897088]   [<c01a5b06>] alloc_inode+0x14/0x5f
[ 1986.897097]   [<c01a5cb9>] iget5_locked+0x62/0x13a
[ 1986.897106]   [<c01e99e0>] reiserfs_fill_super+0x410/0x8b9
[ 1986.897114]   [<c01953da>] mount_bdev+0x10b/0x159
[ 1986.897123]   [<c01e764d>] get_super_block+0x10/0x12
[ 1986.897131]   [<c0195b38>] mount_fs+0x59/0x12d
[ 1986.897138]   [<c01a80d1>] vfs_kern_mount+0x45/0x7a
[ 1986.897147]   [<c01a83e3>] do_kern_mount+0x2f/0xb0
[ 1986.897155]   [<c01a987a>] do_mount+0x5c2/0x612
[ 1986.897163]   [<c01a9a72>] sys_mount+0x61/0x8f
[ 1986.897170]   [<c044060c>] sysenter_do_call+0x12/0x32
[ 1986.897181] irq event stamp: 7509691
[ 1986.897186] hardirqs last  enabled at (7509691): [<c0190f34>] kmem_cache_alloc+0x6e/0x93
[ 1986.897197] hardirqs last disabled at (7509690): [<c0190eea>] kmem_cache_alloc+0x24/0x93
[ 1986.897209] softirqs last  enabled at (7508896): [<c01294bd>] __do_softirq+0xee/0xfd
[ 1986.897222] softirqs last disabled at (7508859): [<c01030ed>] do_softirq+0x50/0x9d
[ 1986.897234]
[ 1986.897235] other info that might help us debug this:
[ 1986.897242]  Possible unsafe locking scenario:
[ 1986.897244]
[ 1986.897250]        CPU0
[ 1986.897254]        ----
[ 1986.897257]   lock(&REISERFS_SB(s)->lock);
[ 1986.897265] <Interrupt>
[ 1986.897269]     lock(&REISERFS_SB(s)->lock);
[ 1986.897276]
[ 1986.897277]  *** DEADLOCK ***
[ 1986.897278]
[ 1986.897286] no locks held by kswapd0/16.
[ 1986.897291]
[ 1986.897292] stack backtrace:
[ 1986.897299] Pid: 16, comm: kswapd0 Not tainted 3.1.1-main #8
[ 1986.897306] Call Trace:
[ 1986.897314]  [<c0439e76>] ? printk+0xf/0x11
[ 1986.897324]  [<c01482d1>] print_usage_bug+0x20e/0x21a
[ 1986.897332]  [<c01479b8>] ? print_irq_inversion_bug+0x172/0x172
[ 1986.897341]  [<c014855c>] mark_lock+0x27f/0x483
[ 1986.897349]  [<c0148d88>] __lock_acquire+0x628/0x1472
[ 1986.897358]  [<c0149fae>] lock_acquire+0x47/0x5e
[ 1986.897366]  [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
[ 1986.897384]  [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
[ 1986.897397]  [<c043b5ef>] mutex_lock_nested+0x35/0x26f
[ 1986.897409]  [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
[ 1986.897421]  [<c01f8bd4>] reiserfs_write_lock+0x20/0x2a
[ 1986.897433]  [<c01e2edd>] map_block_for_writepage+0xc9/0x590
[ 1986.897448]  [<c01b1706>] ? create_empty_buffers+0x33/0x8f
[ 1986.897461]  [<c0121124>] ? get_parent_ip+0xb/0x31
[ 1986.897472]  [<c043ef7f>] ? sub_preempt_count+0x81/0x8e
[ 1986.897485]  [<c043cae0>] ? _raw_spin_unlock+0x27/0x3d
[ 1986.897496]  [<c0121124>] ? get_parent_ip+0xb/0x31
[ 1986.897508]  [<c01e355d>] reiserfs_writepage+0x1b9/0x3e7
[ 1986.897521]  [<c0173b40>] ? clear_page_dirty_for_io+0xcb/0xde
[ 1986.897533]  [<c014a6e3>] ? trace_hardirqs_on_caller+0x108/0x138
[ 1986.897546]  [<c014a71e>] ? trace_hardirqs_on+0xb/0xd
[ 1986.897559]  [<c0177b38>] shrink_page_list+0x34f/0x5e2
[ 1986.897572]  [<c01780a7>] shrink_inactive_list+0x172/0x22c
[ 1986.897585]  [<c0178464>] shrink_zone+0x303/0x3b1
[ 1986.897597]  [<c043cae0>] ? _raw_spin_unlock+0x27/0x3d
[ 1986.897611]  [<c01788c9>] kswapd+0x3b7/0x5f2

The deadlock shouldn't happen since we are doing that allocation in the
mount path, the filesystem is not available for any reclaim.  Still the
warning is annoying.

To solve this, acquire the lock later only where we need it, right before
calling reiserfs_read_locked_inode() that wants to lock to walk the tree.

Reported-by: Knut Petersen <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jeff Mahoney <[email protected]>
Cc: Jan Kara <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Pfiver pushed a commit to Pfiver/linux that referenced this pull request Jan 16, 2012
$ wget "http://pkgs.fedoraproject.org/gitweb/?p=kernel.git;a=blob_plain;f=mac80211_offchannel_rework_revert.patch;h=859799714cd85a58450ecde4a1dabc5adffd5100;hb=refs/heads/f16" -O mac80211_offchannel_rework_revert.patch
$ patch -p1 --dry-run < mac80211_offchannel_rework_revert.patch
patching file net/mac80211/ieee80211_i.h
Hunk #1 succeeded at 702 (offset 8 lines).
Hunk #2 succeeded at 712 (offset 8 lines).
Hunk #3 succeeded at 1143 (offset -57 lines).
patching file net/mac80211/main.c
patching file net/mac80211/offchannel.c
Hunk #1 succeeded at 18 (offset 1 line).
Hunk #2 succeeded at 42 (offset 1 line).
Hunk #3 succeeded at 78 (offset 1 line).
Hunk #4 succeeded at 96 (offset 1 line).
Hunk #5 succeeded at 162 (offset 1 line).
Hunk torvalds#6 succeeded at 182 (offset 1 line).
patching file net/mac80211/rx.c
Hunk #1 succeeded at 421 (offset 4 lines).
Hunk #2 succeeded at 2864 (offset 87 lines).
patching file net/mac80211/scan.c
Hunk #1 succeeded at 213 (offset 1 line).
Hunk #2 succeeded at 256 (offset 2 lines).
Hunk #3 succeeded at 288 (offset 2 lines).
Hunk #4 succeeded at 333 (offset 2 lines).
Hunk #5 succeeded at 482 (offset 2 lines).
Hunk torvalds#6 succeeded at 498 (offset 2 lines).
Hunk torvalds#7 succeeded at 516 (offset 2 lines).
Hunk torvalds#8 succeeded at 530 (offset 2 lines).
Hunk torvalds#9 succeeded at 555 (offset 2 lines).
patching file net/mac80211/tx.c
Hunk #1 succeeded at 259 (offset 1 line).
patching file net/mac80211/work.c
Hunk #1 succeeded at 899 (offset -2 lines).
Hunk #2 succeeded at 949 (offset -2 lines).
Hunk #3 succeeded at 1046 (offset -2 lines).
Hunk #4 succeeded at 1054 (offset -2 lines).
jkstrick pushed a commit to jkstrick/linux that referenced this pull request Feb 11, 2012
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not
update the real num tx queues. netdev_queue_update_kobjects() is already
called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when
upper layer driver, e.g., FCoE protocol stack is monitoring the netdev
event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove
extra queues allocated for FCoE, the associated txq sysfs kobjects are already
removed, and trying to update the real num queues would cause something like
below:

...
PID: 25138  TASK: ffff88021e64c440  CPU: 3   COMMAND: "kworker/3:3"
 #0 [ffff88021f007760] machine_kexec at ffffffff810226d9
 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d
 #2 [ffff88021f0078a0] oops_end at ffffffff813bca78
 #3 [ffff88021f0078d0] no_context at ffffffff81029e72
 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155
 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e
 torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e
 torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045
    [exception RIP: sysfs_find_dirent+17]
    RIP: ffffffff81178611  RSP: ffff88021f007bc0  RFLAGS: 00010246
    RAX: ffff88021e64c440  RBX: ffffffff8156cc63  RCX: 0000000000000004
    RDX: ffffffff8156cc63  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88021f007be0   R8: 0000000000000004   R9: 0000000000000008
    R10: ffffffff816fed00  R11: 0000000000000004  R12: 0000000000000000
    R13: ffffffff8156cc63  R14: 0000000000000000  R15: ffff8802222a0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07
 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27
torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9
torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38
torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe]
torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe]
torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe]
torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q]
torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe]
torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe]
torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca
torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513
torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6
torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4

Signed-off-by: Yi Zou <[email protected]>
Tested-by: Ross Brattain <[email protected]>
Tested-by: Stephen Ko <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
zachariasmaladroit pushed a commit to galaxys-cm7miui-kernel/linux that referenced this pull request Feb 11, 2012
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not
update the real num tx queues. netdev_queue_update_kobjects() is already
called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when
upper layer driver, e.g., FCoE protocol stack is monitoring the netdev
event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove
extra queues allocated for FCoE, the associated txq sysfs kobjects are already
removed, and trying to update the real num queues would cause something like
below:

...
PID: 25138  TASK: ffff88021e64c440  CPU: 3   COMMAND: "kworker/3:3"
 #0 [ffff88021f007760] machine_kexec at ffffffff810226d9
 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d
 #2 [ffff88021f0078a0] oops_end at ffffffff813bca78
 #3 [ffff88021f0078d0] no_context at ffffffff81029e72
 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155
 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e
 torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e
 torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045
    [exception RIP: sysfs_find_dirent+17]
    RIP: ffffffff81178611  RSP: ffff88021f007bc0  RFLAGS: 00010246
    RAX: ffff88021e64c440  RBX: ffffffff8156cc63  RCX: 0000000000000004
    RDX: ffffffff8156cc63  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88021f007be0   R8: 0000000000000004   R9: 0000000000000008
    R10: ffffffff816fed00  R11: 0000000000000004  R12: 0000000000000000
    R13: ffffffff8156cc63  R14: 0000000000000000  R15: ffff8802222a0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07
 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27
torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9
torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38
torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe]
torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe]
torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe]
torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q]
torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe]
torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe]
torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca
torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513
torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6
torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4

Signed-off-by: Yi Zou <[email protected]>
Tested-by: Ross Brattain <[email protected]>
Tested-by: Stephen Ko <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
tworaz pushed a commit to tworaz/linux that referenced this pull request Feb 13, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d14] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8d
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
xXorAa pushed a commit to xXorAa/linux that referenced this pull request Feb 17, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d14] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8d
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi referenced this pull request in koenkooi/linux Feb 23, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koct9i pushed a commit to koct9i/linux that referenced this pull request Mar 1, 2012
ata_port lifetime in libata follows the host.  In libsas it follows the
scsi_target.  Once scsi_remove_device() has caused all commands to be
completed it allows scsi_remove_target() to immediately proceed to
freeing the ata_port causing bug reports like:

[  848.393333] BUG: spinlock bad magic on CPU#4, kworker/u:2/5107
[  848.400262] general protection fault: 0000 [#1] SMP
[  848.406244] CPU 4
[  848.408310] Modules linked in: nls_utf8 ipv6 uinput i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma dca sg sd_mod sr_mod cdrom ahci libahci isci libsas libata scsi_transport_sas [last unloaded: scsi_wait_scan]
[  848.432060]
[  848.434137] Pid: 5107, comm: kworker/u:2 Not tainted 3.2.0-isci+ torvalds#8 Intel Corporation S2600CP/S2600CP
[  848.445310] RIP: 0010:[<ffffffff8126a68c>]  [<ffffffff8126a68c>] spin_dump+0x5e/0x8c
[  848.454787] RSP: 0018:ffff8807f868dca0  EFLAGS: 00010002
[  848.461137] RAX: 0000000000000048 RBX: ffff8807fe86a630 RCX: ffffffff817d0be0
[  848.469520] RDX: 0000000000000000 RSI: ffffffff814af1cf RDI: 0000000000000002
[  848.477959] RBP: ffff8807f868dcb0 R08: 00000000ffffffff R09: 000000006b6b6b6b
[  848.486327] R10: 000000000003fb8c R11: ffffffff81a19448 R12: 6b6b6b6b6b6b6b6b
[  848.494699] R13: ffff8808027dc520 R14: 0000000000000000 R15: 000000000000001e
[  848.503067] FS:  0000000000000000(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000000
[  848.512899] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  848.519710] CR2: 00007ff77d001000 CR3: 00000007f7a5d000 CR4: 00000000000406e0
[  848.528072] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  848.536446] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  848.544831] Process kworker/u:2 (pid: 5107, threadinfo ffff8807f868c000, task ffff8807ff348000)
[  848.555327] Stack:
[  848.557959]  ffff8807fe86a630 ffff8807fe86a630 ffff8807f868dcd0 ffffffff8126a6e0
[  848.567072]  ffffffff817c142f ffff8807fe86a630 ffff8807f868dcf0 ffffffff8126a703
[  848.576190]  ffff8808027dc520 0000000000000286 ffff8807f868dd10 ffffffff814af1bb
[  848.585281] Call Trace:
[  848.588409]  [<ffffffff8126a6e0>] spin_bug+0x26/0x28
[  848.594357]  [<ffffffff8126a703>] do_raw_spin_unlock+0x21/0x88
[  848.601283]  [<ffffffff814af1bb>] _raw_spin_unlock_irqrestore+0x2c/0x65
[  848.609089]  [<ffffffffa001c103>] ata_scsi_port_error_handler+0x548/0x557 [libata]
[  848.618331]  [<ffffffff81061813>] ? async_schedule+0x17/0x17
[  848.625060]  [<ffffffffa004f30f>] async_sas_ata_eh+0x45/0x69 [libsas]
[  848.632655]  [<ffffffff810618aa>] async_run_entry_fn+0x97/0x125
[  848.639670]  [<ffffffff81057439>] process_one_work+0x207/0x38d
[  848.646577]  [<ffffffff8105738c>] ? process_one_work+0x15a/0x38d
[  848.653681]  [<ffffffff810576f7>] worker_thread+0x138/0x21c
[  848.660305]  [<ffffffff810575bf>] ? process_one_work+0x38d/0x38d
[  848.667493]  [<ffffffff8105b098>] kthread+0x9d/0xa5
[  848.673382]  [<ffffffff8106e1bd>] ? trace_hardirqs_on_caller+0x12f/0x166
[  848.681304]  [<ffffffff814b7704>] kernel_thread_helper+0x4/0x10
[  848.688324]  [<ffffffff814af534>] ? retint_restore_args+0x13/0x13
[  848.695530]  [<ffffffff8105affb>] ? __init_kthread_worker+0x5b/0x5b
[  848.702929]  [<ffffffff814b7700>] ? gs_change+0x13/0x13
[  848.709155] Code: 00 00 48 8d 88 38 04 00 00 44 8b 80 84 02 00 00 31 c0 e8 cf 1b 24 00 41 83 c8 ff 44 8b 4b 08 48 c7 c1 e0 0b 7d 81 4d 85 e4 74 10 <45> 8b 84 24 84 02 00 00 49 8d 8c 24 38 04 00 00 8b 53 04 48 89
[  848.732467] RIP  [<ffffffff8126a68c>] spin_dump+0x5e/0x8c
[  848.738905]  RSP <ffff8807f868dca0>
[  848.743743] ---[ end trace 143161646eee8caa ]---

...so arrange for the ata_port to have the same end of life as the domain
device.

Reported-by: Marcin Tomczak <[email protected]>
Acked-by: Jeff Garzik <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
koenkooi referenced this pull request in koenkooi/linux Mar 1, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi referenced this pull request in koenkooi/linux Mar 19, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi referenced this pull request in koenkooi/linux Mar 22, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi referenced this pull request in koenkooi/linux Apr 2, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi referenced this pull request in koenkooi/linux Apr 9, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi referenced this pull request in koenkooi/linux Apr 11, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koenkooi referenced this pull request in koenkooi/linux Apr 12, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8d
 #9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
psanford pushed a commit to retailnext/linux that referenced this pull request Apr 16, 2012
BugLink: http://bugs.launchpad.net/bugs/890952

commit a18a920 upstream.

This patch validates sdev pointer in scsi_dh_activate before proceeding further.

Without this check we might see the panic as below. I have seen this
panic multiple times..

Call trace:

 #0 [ffff88007d647b50] machine_kexec at ffffffff81020902
 #1 [ffff88007d647ba0] crash_kexec at ffffffff810875b0
 #2 [ffff88007d647c70] oops_end at ffffffff8139c650
 #3 [ffff88007d647c90] __bad_area_nosemaphore at ffffffff8102dd15
 #4 [ffff88007d647d50] page_fault at ffffffff8139b8cf
    [exception RIP: scsi_dh_activate+0x82]
    RIP: ffffffffa0041922  RSP: ffff88007d647e00  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 00000000000093c5
    RDX: 00000000000093c5  RSI: ffffffffa02e6640  RDI: ffff88007cc88988
    RBP: 000000000000000f   R8: ffff88007d646000   R9: 0000000000000000
    R10: ffff880082293790  R11: 00000000ffffffff  R12: ffff88007cc88988
    R13: 0000000000000000  R14: 0000000000000286  R15: ffff880037b845e0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #5 [ffff88007d647e38] run_workqueue at ffffffff81060268
 torvalds#6 [ffff88007d647e78] worker_thread at ffffffff81060386
 torvalds#7 [ffff88007d647ee8] kthread at ffffffff81064436
 torvalds#8 [ffff88007d647f48] kernel_thread at ffffffff81003fba

Signed-off-by: Babu Moger <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Tim Gardner <[email protected]>
psanford pushed a commit to retailnext/linux that referenced this pull request Apr 16, 2012
BugLink: http://bugs.launchpad.net/bugs/890952

commit f992ae8 upstream.

The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ torvalds#8 Bochs Bochs
 RIP: 0010:[<ffffffff810d0879>]  [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [<ffffffff810d2845>] lock_acquire+0x95/0x140
  [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
  [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70
  [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
  [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
  [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0
  [<ffffffff811c40df>] blkdev_put+0x5f/0x190
  [<ffffffff8118f18d>] kill_block_super+0x4d/0x80
  [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
  [<ffffffff8119003a>] deactivate_super+0x4a/0x70
  [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
  [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0
  [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Tim Gardner <[email protected]>
psanford pushed a commit to retailnext/linux that referenced this pull request Apr 16, 2012
BugLink: http://bugs.launchpad.net/bugs/907778

commit ea51d13 upstream.

If the pte mapping in generic_perform_write() is unmapped between
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
"copied" parameter to ->end_write can be zero. ext4 couldn't cope with
it with delayed allocations enabled. This skips the i_disksize
enlargement logic if copied is zero and no new data was appeneded to
the inode.

 gdb> bt
 #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
 ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
 #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
 os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
 #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
 xffff88001e26be40) at mm/filemap.c:2600
 #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
 zed out>, pos=<value optimized out>) at mm/filemap.c:2632
 torvalds#6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
 t fs/ext4/file.c:136
 torvalds#7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
 ppos=0xffff88001e26bf48) at fs/read_write.c:406
 torvalds#8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
 000, pos=0xffff88001e26bf48) at fs/read_write.c:435
 torvalds#9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
 4000) at fs/read_write.c:487
 torvalds#10 <signal handler called>
 torvalds#11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
 torvalds#12 0x0000000000000000 in ?? ()
 gdb> print offset
 $22 = 0xffffffffffffffff
 gdb> print idx
 $23 = 0xffffffff
 gdb> print inode->i_blkbits
 $24 = 0xc
 gdb> up
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 2512                    if (ext4_da_should_update_i_disksize(page, end)) {
 gdb> print start
 $25 = 0x0
 gdb> print end
 $26 = 0xffffffffffffffff
 gdb> print pos
 $27 = 0x108000
 gdb> print new_i_size
 $28 = 0x108000
 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
 $29 = 0xd9000
 gdb> down
 2467            for (i = 0; i < idx; i++)
 gdb> print i
 $30 = 0xd44acbee

This is 100% reproducible with some autonuma development code tuned in
a very aggressive manner (not normal way even for knumad) which does
"exotic" changes to the ptes. It wouldn't normally trigger but I don't
see why it can't happen normally if the page is added to swap cache in
between the two faults leading to "copied" being zero (which then
hangs in ext4). So it should be fixed. Especially possible with lumpy
reclaim (albeit disabled if compaction is enabled) as that would
ignore the young bits in the ptes.

Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Tim Gardner <[email protected]>
Signed-off-by: Brad Figg <[email protected]>
psanford pushed a commit to retailnext/linux that referenced this pull request Apr 16, 2012
…S block during isolation for migration

BugLink: http://bugs.launchpad.net/bugs/931719

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72e
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a
 torvalds#7 [d72d3d14] zone_watermark_ok at c02d26cb
 torvalds#8 [d72d3d2c] compact_zone at c030b8d
 torvalds#9 [d72d3d68] compact_zone_order at c030bba1
torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84
torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845
torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6
torvalds#17 [d72d3f30] do_page_fault at c05c70ed
torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <[email protected]>
Tested-by: Herbert van den Bergh <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Tim Gardner <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 21, 2025
 - treat tailcall count as 32-bit for access and update
 - change out_offset scope from file to function
 - minor format/structure changes for consistency

Testing: (skipping fentry, fexit, freplace)
========

root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls
test_bpf: #0 Tail call leaf jited:1 967 PASS
test_bpf: #1 Tail call 2 jited:1 1427 PASS
test_bpf: #2 Tail call 3 jited:1 2373 PASS
test_bpf: #3 Tail call 4 jited:1 2304 PASS
test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS
test_bpf: #5 Tail call load/store jited:1 2249 PASS
test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS
test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS
test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS
test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31
397/1   tailcalls/tailcall_1:OK
397/2   tailcalls/tailcall_2:OK
397/3   tailcalls/tailcall_3:OK
397/4   tailcalls/tailcall_4:OK
397/5   tailcalls/tailcall_5:OK
397/6   tailcalls/tailcall_6:OK
397/7   tailcalls/tailcall_bpf2bpf_1:OK
397/8   tailcalls/tailcall_bpf2bpf_2:OK
397/9   tailcalls/tailcall_bpf2bpf_3:OK
397/10  tailcalls/tailcall_bpf2bpf_4:OK
397/11  tailcalls/tailcall_bpf2bpf_5:OK
397/12  tailcalls/tailcall_bpf2bpf_6:OK
397/17  tailcalls/tailcall_poke:OK
397/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
397/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
397/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
397/27  tailcalls/tailcall_failure:OK
397/28  tailcalls/reject_tail_call_spin_lock:OK
397/29  tailcalls/reject_tail_call_rcu_lock:OK
397/30  tailcalls/reject_tail_call_preempt_lock:OK
397/31  tailcalls/reject_tail_call_ref:OK
397     tailcalls:OK
Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 21, 2025
Update current JIT offsets to enable and set up BPF line info for better
introspection and debugging. Offsets map each xlated insn to the start of
its JITed code, as well as the epilogue. This also allows simplifying some
code and dropping unneeded JIT ctx variables.

Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly,
before this change we see:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

While afterwards we have:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
; struct seq_file *seq = ctx->meta->seq;
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
; struct seq_file *seq = ctx->meta->seq;
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
; struct udp_sock *udp_sk = ctx->udp_sk;
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

which aligns with the original source code.

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 21, 2025
With test_progs compiled with ARM THUMB insns, trigger_func*() used for
testing uprobes have odd (LSB set) entry points due to interworking. This
fails alignment check first, then due to uprobe unsupported for THUMB in
prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn().

First fix by setting trigger_func*() as ARM insns and rely on interworking
enabled by default. Doesn't work on Debian bookworm target because insn #1
of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched
by dynamic loader? Kernel somehow?

__attribute__((target("arm")))

Works to compile all test_progs with gcc -marm, or just attach_probe.c.

Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid
using THUMB insns, with expected tests passing:

torvalds#8/1     attach_probe/manual-default:OK
torvalds#8/2     attach_probe/manual-legacy:OK
torvalds#8/3     attach_probe/manual-perf:OK
torvalds#8/4     attach_probe/manual-link:OK
torvalds#8/5     attach_probe/auto:OK
torvalds#8/6     attach_probe/kprobe-sleepable:OK
torvalds#8/8     attach_probe/uprobe-sleepable:OK
torvalds#8/9     attach_probe/uprobe-ref_ctr:OK
torvalds#8/10    attach_probe/uprobe-long_name:OK
torvalds#8/11    attach_probe/kprobe-long_name:OK
torvalds#8       attach_probe:OK

Note that attach_probe/uprobe-lib will still fail since it tries to attach
a uprobe to the system libc.so, which is compiled for THUMB:

test_attach_probe:PASS:skel_open 0 nsec
test_attach_probe:PASS:skel_load 0 nsec
test_attach_probe:PASS:check_bss 0 nsec
test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec
libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL
test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22
torvalds#8/7     attach_probe/uprobe-lib:FAIL
torvalds#8       attach_probe:FAIL

Also compile other source files and binary with same issue using "-marm":

bpf_cookie.c
fill_link_info.c
task_pt_regs.test.c
uprobe_autoattach.c (also tries libc.so attach)
usdt.c
uprobe_multi

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 23, 2025
 - treat tailcall count as 32-bit for access and update
 - change out_offset scope from file to function
 - minor format/structure changes for consistency

Testing: (skipping fentry, fexit, freplace)
========

root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls
test_bpf: #0 Tail call leaf jited:1 967 PASS
test_bpf: #1 Tail call 2 jited:1 1427 PASS
test_bpf: #2 Tail call 3 jited:1 2373 PASS
test_bpf: #3 Tail call 4 jited:1 2304 PASS
test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS
test_bpf: #5 Tail call load/store jited:1 2249 PASS
test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS
test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS
test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS
test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31
397/1   tailcalls/tailcall_1:OK
397/2   tailcalls/tailcall_2:OK
397/3   tailcalls/tailcall_3:OK
397/4   tailcalls/tailcall_4:OK
397/5   tailcalls/tailcall_5:OK
397/6   tailcalls/tailcall_6:OK
397/7   tailcalls/tailcall_bpf2bpf_1:OK
397/8   tailcalls/tailcall_bpf2bpf_2:OK
397/9   tailcalls/tailcall_bpf2bpf_3:OK
397/10  tailcalls/tailcall_bpf2bpf_4:OK
397/11  tailcalls/tailcall_bpf2bpf_5:OK
397/12  tailcalls/tailcall_bpf2bpf_6:OK
397/17  tailcalls/tailcall_poke:OK
397/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
397/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
397/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
397/27  tailcalls/tailcall_failure:OK
397/28  tailcalls/reject_tail_call_spin_lock:OK
397/29  tailcalls/reject_tail_call_rcu_lock:OK
397/30  tailcalls/reject_tail_call_preempt_lock:OK
397/31  tailcalls/reject_tail_call_ref:OK
397     tailcalls:OK
Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 23, 2025
Update current JIT offsets to enable and set up BPF line info for better
introspection and debugging. Offsets map each xlated insn to the start of
its JITed code, as well as the epilogue. This also allows simplifying some
code and dropping unneeded JIT ctx variables.

Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly,
before this change we see:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

While afterwards we have:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
; struct seq_file *seq = ctx->meta->seq;
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
; struct seq_file *seq = ctx->meta->seq;
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
; struct udp_sock *udp_sk = ctx->udp_sk;
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

which aligns with the original source code.

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 23, 2025
With test_progs compiled with ARM THUMB insns, trigger_func*() used for
testing uprobes have odd (LSB set) entry points due to interworking. This
fails alignment check first, then due to uprobe unsupported for THUMB in
prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn().

First fix by setting trigger_func*() as ARM insns and rely on interworking
enabled by default. Doesn't work on Debian bookworm target because insn #1
of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched
by dynamic loader? Kernel somehow?

__attribute__((target("arm")))

Works to compile all test_progs with gcc -marm, or just attach_probe.c.

Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid
using THUMB insns, with expected tests passing:

torvalds#8/1     attach_probe/manual-default:OK
torvalds#8/2     attach_probe/manual-legacy:OK
torvalds#8/3     attach_probe/manual-perf:OK
torvalds#8/4     attach_probe/manual-link:OK
torvalds#8/5     attach_probe/auto:OK
torvalds#8/6     attach_probe/kprobe-sleepable:OK
torvalds#8/8     attach_probe/uprobe-sleepable:OK
torvalds#8/9     attach_probe/uprobe-ref_ctr:OK
torvalds#8/10    attach_probe/uprobe-long_name:OK
torvalds#8/11    attach_probe/kprobe-long_name:OK
torvalds#8       attach_probe:OK

Note that attach_probe/uprobe-lib will still fail since it tries to attach
a uprobe to the system libc.so, which is compiled for THUMB:

test_attach_probe:PASS:skel_open 0 nsec
test_attach_probe:PASS:skel_load 0 nsec
test_attach_probe:PASS:check_bss 0 nsec
test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec
libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL
test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22
torvalds#8/7     attach_probe/uprobe-lib:FAIL
torvalds#8       attach_probe:FAIL

Also compile other source files and binary with same issue using "-marm":

bpf_cookie.c
fill_link_info.c
task_pt_regs.test.c
uprobe_autoattach.c (also tries libc.so attach)
usdt.c
uprobe_multi

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 23, 2025
 - treat tailcall count as 32-bit for access and update
 - change out_offset scope from file to function
 - minor format/structure changes for consistency

Testing: (skipping fentry, fexit, freplace)
========

root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls
test_bpf: #0 Tail call leaf jited:1 967 PASS
test_bpf: #1 Tail call 2 jited:1 1427 PASS
test_bpf: #2 Tail call 3 jited:1 2373 PASS
test_bpf: #3 Tail call 4 jited:1 2304 PASS
test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS
test_bpf: #5 Tail call load/store jited:1 2249 PASS
test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS
test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS
test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS
test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31
397/1   tailcalls/tailcall_1:OK
397/2   tailcalls/tailcall_2:OK
397/3   tailcalls/tailcall_3:OK
397/4   tailcalls/tailcall_4:OK
397/5   tailcalls/tailcall_5:OK
397/6   tailcalls/tailcall_6:OK
397/7   tailcalls/tailcall_bpf2bpf_1:OK
397/8   tailcalls/tailcall_bpf2bpf_2:OK
397/9   tailcalls/tailcall_bpf2bpf_3:OK
397/10  tailcalls/tailcall_bpf2bpf_4:OK
397/11  tailcalls/tailcall_bpf2bpf_5:OK
397/12  tailcalls/tailcall_bpf2bpf_6:OK
397/17  tailcalls/tailcall_poke:OK
397/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
397/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
397/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
397/27  tailcalls/tailcall_failure:OK
397/28  tailcalls/reject_tail_call_spin_lock:OK
397/29  tailcalls/reject_tail_call_rcu_lock:OK
397/30  tailcalls/reject_tail_call_preempt_lock:OK
397/31  tailcalls/reject_tail_call_ref:OK
397     tailcalls:OK
Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 23, 2025
Update current JIT offsets to enable and set up BPF line info for better
introspection and debugging. Offsets map each xlated insn to the start of
its JITed code, as well as the epilogue. This also allows simplifying some
code and dropping unneeded JIT ctx variables.

Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly,
before this change we see:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

While afterwards we have:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
; struct seq_file *seq = ctx->meta->seq;
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
; struct seq_file *seq = ctx->meta->seq;
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
; struct udp_sock *udp_sk = ctx->udp_sk;
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

which aligns with the original source code.

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 23, 2025
With test_progs compiled with ARM THUMB insns, trigger_func*() used for
testing uprobes have odd (LSB set) entry points due to interworking. This
fails alignment check first, then due to uprobe unsupported for THUMB in
prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn().

First fix by setting trigger_func*() as ARM insns and rely on interworking
enabled by default. Doesn't work on Debian bookworm target because insn #1
of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched
by dynamic loader? Kernel somehow?

__attribute__((target("arm")))

Works to compile all test_progs with gcc -marm, or just attach_probe.c.

Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid
using THUMB insns, with expected tests passing:

torvalds#8/1     attach_probe/manual-default:OK
torvalds#8/2     attach_probe/manual-legacy:OK
torvalds#8/3     attach_probe/manual-perf:OK
torvalds#8/4     attach_probe/manual-link:OK
torvalds#8/5     attach_probe/auto:OK
torvalds#8/6     attach_probe/kprobe-sleepable:OK
torvalds#8/8     attach_probe/uprobe-sleepable:OK
torvalds#8/9     attach_probe/uprobe-ref_ctr:OK
torvalds#8/10    attach_probe/uprobe-long_name:OK
torvalds#8/11    attach_probe/kprobe-long_name:OK
torvalds#8       attach_probe:OK

Note that attach_probe/uprobe-lib will still fail since it tries to attach
a uprobe to the system libc.so, which is compiled for THUMB:

test_attach_probe:PASS:skel_open 0 nsec
test_attach_probe:PASS:skel_load 0 nsec
test_attach_probe:PASS:check_bss 0 nsec
test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec
libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL
test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22
torvalds#8/7     attach_probe/uprobe-lib:FAIL
torvalds#8       attach_probe:FAIL

Also compile other source files and binary with same issue using "-marm":

bpf_cookie.c
fill_link_info.c
task_pt_regs.test.c
uprobe_autoattach.c (also tries libc.so attach)
usdt.c
uprobe_multi

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 23, 2025
 - treat tailcall count as 32-bit for access and update
 - change out_offset scope from file to function
 - minor format/structure changes for consistency

Testing: (skipping fentry, fexit, freplace)
========

root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls
test_bpf: #0 Tail call leaf jited:1 967 PASS
test_bpf: #1 Tail call 2 jited:1 1427 PASS
test_bpf: #2 Tail call 3 jited:1 2373 PASS
test_bpf: #3 Tail call 4 jited:1 2304 PASS
test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS
test_bpf: #5 Tail call load/store jited:1 2249 PASS
test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS
test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS
test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS
test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31
397/1   tailcalls/tailcall_1:OK
397/2   tailcalls/tailcall_2:OK
397/3   tailcalls/tailcall_3:OK
397/4   tailcalls/tailcall_4:OK
397/5   tailcalls/tailcall_5:OK
397/6   tailcalls/tailcall_6:OK
397/7   tailcalls/tailcall_bpf2bpf_1:OK
397/8   tailcalls/tailcall_bpf2bpf_2:OK
397/9   tailcalls/tailcall_bpf2bpf_3:OK
397/10  tailcalls/tailcall_bpf2bpf_4:OK
397/11  tailcalls/tailcall_bpf2bpf_5:OK
397/12  tailcalls/tailcall_bpf2bpf_6:OK
397/17  tailcalls/tailcall_poke:OK
397/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
397/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
397/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
397/27  tailcalls/tailcall_failure:OK
397/28  tailcalls/reject_tail_call_spin_lock:OK
397/29  tailcalls/reject_tail_call_rcu_lock:OK
397/30  tailcalls/reject_tail_call_preempt_lock:OK
397/31  tailcalls/reject_tail_call_ref:OK
397     tailcalls:OK
Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 23, 2025
Update current JIT offsets to enable and set up BPF line info for better
introspection and debugging. Offsets map each xlated insn to the start of
its JITed code, as well as the epilogue. This also allows simplifying some
code and dropping unneeded JIT ctx variables.

Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly,
before this change we see:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

While afterwards we have:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
; struct seq_file *seq = ctx->meta->seq;
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
; struct seq_file *seq = ctx->meta->seq;
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
; struct udp_sock *udp_sk = ctx->udp_sk;
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

which aligns with the original source code.

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 23, 2025
With test_progs compiled with ARM THUMB insns, trigger_func*() used for
testing uprobes have odd (LSB set) entry points due to interworking. This
fails alignment check first, then due to uprobe unsupported for THUMB in
prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn().

First fix by setting trigger_func*() as ARM insns and rely on interworking
enabled by default. Doesn't work on Debian bookworm target because insn #1
of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched
by dynamic loader? Kernel somehow?

__attribute__((target("arm")))

Works to compile all test_progs with gcc -marm, or just attach_probe.c.

Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid
using THUMB insns, with expected tests passing:

torvalds#8/1     attach_probe/manual-default:OK
torvalds#8/2     attach_probe/manual-legacy:OK
torvalds#8/3     attach_probe/manual-perf:OK
torvalds#8/4     attach_probe/manual-link:OK
torvalds#8/5     attach_probe/auto:OK
torvalds#8/6     attach_probe/kprobe-sleepable:OK
torvalds#8/8     attach_probe/uprobe-sleepable:OK
torvalds#8/9     attach_probe/uprobe-ref_ctr:OK
torvalds#8/10    attach_probe/uprobe-long_name:OK
torvalds#8/11    attach_probe/kprobe-long_name:OK
torvalds#8       attach_probe:OK

Note that attach_probe/uprobe-lib will still fail since it tries to attach
a uprobe to the system libc.so, which is compiled for THUMB:

test_attach_probe:PASS:skel_open 0 nsec
test_attach_probe:PASS:skel_load 0 nsec
test_attach_probe:PASS:check_bss 0 nsec
test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec
libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL
test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22
torvalds#8/7     attach_probe/uprobe-lib:FAIL
torvalds#8       attach_probe:FAIL

Also compile other source files and binary with same issue using "-marm":

bpf_cookie.c
fill_link_info.c
task_pt_regs.test.c
uprobe_autoattach.c (also tries libc.so attach)
usdt.c
uprobe_multi

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 23, 2025
 - treat tailcall count as 32-bit for access and update
 - change out_offset scope from file to function
 - minor format/structure changes for consistency

Testing: (skipping fentry, fexit, freplace)
========

root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls
test_bpf: #0 Tail call leaf jited:1 967 PASS
test_bpf: #1 Tail call 2 jited:1 1427 PASS
test_bpf: #2 Tail call 3 jited:1 2373 PASS
test_bpf: #3 Tail call 4 jited:1 2304 PASS
test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS
test_bpf: #5 Tail call load/store jited:1 2249 PASS
test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS
test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS
test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS
test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31
397/1   tailcalls/tailcall_1:OK
397/2   tailcalls/tailcall_2:OK
397/3   tailcalls/tailcall_3:OK
397/4   tailcalls/tailcall_4:OK
397/5   tailcalls/tailcall_5:OK
397/6   tailcalls/tailcall_6:OK
397/7   tailcalls/tailcall_bpf2bpf_1:OK
397/8   tailcalls/tailcall_bpf2bpf_2:OK
397/9   tailcalls/tailcall_bpf2bpf_3:OK
397/10  tailcalls/tailcall_bpf2bpf_4:OK
397/11  tailcalls/tailcall_bpf2bpf_5:OK
397/12  tailcalls/tailcall_bpf2bpf_6:OK
397/17  tailcalls/tailcall_poke:OK
397/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
397/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
397/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
397/27  tailcalls/tailcall_failure:OK
397/28  tailcalls/reject_tail_call_spin_lock:OK
397/29  tailcalls/reject_tail_call_rcu_lock:OK
397/30  tailcalls/reject_tail_call_preempt_lock:OK
397/31  tailcalls/reject_tail_call_ref:OK
397     tailcalls:OK
Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 23, 2025
Update current JIT offsets to enable and set up BPF line info for better
introspection and debugging. Offsets map each xlated insn to the start of
its JITed code, as well as the epilogue. This also allows simplifying some
code and dropping unneeded JIT ctx variables.

Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly,
before this change we see:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

While afterwards we have:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
; struct seq_file *seq = ctx->meta->seq;
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
; struct seq_file *seq = ctx->meta->seq;
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
; struct udp_sock *udp_sk = ctx->udp_sk;
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

which aligns with the original source code.

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 23, 2025
With test_progs compiled with ARM THUMB insns, trigger_func*() used for
testing uprobes have odd (LSB set) entry points due to interworking. This
fails alignment check first, then due to uprobe unsupported for THUMB in
prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn().

First fix by setting trigger_func*() as ARM insns and rely on interworking
enabled by default. Doesn't work on Debian bookworm target because insn #1
of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched
by dynamic loader? Kernel somehow?

__attribute__((target("arm")))

Works to compile all test_progs with gcc -marm, or just attach_probe.c.

Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid
using THUMB insns, with expected tests passing:

torvalds#8/1     attach_probe/manual-default:OK
torvalds#8/2     attach_probe/manual-legacy:OK
torvalds#8/3     attach_probe/manual-perf:OK
torvalds#8/4     attach_probe/manual-link:OK
torvalds#8/5     attach_probe/auto:OK
torvalds#8/6     attach_probe/kprobe-sleepable:OK
torvalds#8/8     attach_probe/uprobe-sleepable:OK
torvalds#8/9     attach_probe/uprobe-ref_ctr:OK
torvalds#8/10    attach_probe/uprobe-long_name:OK
torvalds#8/11    attach_probe/kprobe-long_name:OK
torvalds#8       attach_probe:OK

Note that attach_probe/uprobe-lib will still fail since it tries to attach
a uprobe to the system libc.so, which is compiled for THUMB:

test_attach_probe:PASS:skel_open 0 nsec
test_attach_probe:PASS:skel_load 0 nsec
test_attach_probe:PASS:check_bss 0 nsec
test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec
libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL
test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22
torvalds#8/7     attach_probe/uprobe-lib:FAIL
torvalds#8       attach_probe:FAIL

Also compile other source files and binary with same issue using "-marm":

bpf_cookie.c
fill_link_info.c
task_pt_regs.test.c
uprobe_autoattach.c (also tries libc.so attach)
usdt.c
uprobe_multi

Signed-off-by: Tony Ambardar <[email protected]>
evlaV-bot pushed a commit to evlaV/linux-integration that referenced this pull request Sep 24, 2025
Add a driver for the Unicam camera receiver block on BCM283x processors.
Compared to the bcm2835-camera driver present in staging, this driver
handles the Unicam block only (CSI-2 receiver), and doesn't depend on
the VC4 firmware running on the VPU.

The commit is made up of a series of changes cherry-picked from the
rpi-5.4.y branch of https://github.com/raspberrypi/linux/ with
additional enhancements, forward-ported to the mainline kernel.

Signed-off-by: Dave Stevenson <[email protected]>
Signed-off-by: Naushir Patuck <[email protected]>
Signed-off-by: Laurent Pinchart <[email protected]>
Reported-by: kbuild test robot <[email protected]>

media: bcm2835-unicam: Add support for get_mbus_config to set num lanes

Use the get_mbus_config pad subdev call to allow a source to use
fewer than the number of CSI2 lanes defined in device tree.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Avoid gcc warning over {0} on endpoint

Older gcc versions object to = { 0 } initialisation if the first
elemtn in the structure is a substructure.

Use = { } to avoid this compiler warning.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Reinstate V4L2_CAP_READWRITE in the caps

v4l2-compliance throws a failure if the device doesn't advertise
V4L2_CAP_READWRITE but allows read or write operations.
We do support read, so reinstate the flag.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Ensure type is VIDEO_CAPTURE in [g|s]_selection

[g|s]_selection pass in a buffer type that needs to be validated
before passing on to the sensor subdev.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835: unicam: Set VPU min clock freq to 250Mhz.

When streaming with Unicam, the VPU must have a clock frequency of at
least 250Mhz.  Otherwise, the input fifos could overrun, causing
image corruption.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Drop WARN on uing direct cache alias

Pi 0&1 pass all ARM accesses through the VPU L2 cache, therefore
the dma-ranges property sets the cache alias bits to other
than the direct alias, hence this WARN was firing.

It was overprotective coding, so assume that everything is OK
with the dma-ranges, and remove the WARN.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Always service interrupts

From when bringing up the driver, there was a check in the isr
to ignore interrupts (claiming them handled) should the driver
not be streaming.

The VPU now will not register a camera driver if it finds a
CSI2 node enabled in device tree, therefore this flawed check is
redundant.

raspberrypi/linux#3602

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835: unicam: Fix uninitialized warning

Signed-off-by: Jacko Dirks <[email protected]>

media: bcm2835-unicam: Fixup review comments from Hans.

Updates the driver based on the upstream review comments from
Hans Verkuil at https://patchwork.linuxtv.org/patch/63531/

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Retain packing information on G_FMT

The change to retrieve the pixel format always on g_fmt didn't
check whether the native or unpacked version of the format
had been requested, and always returned the packed one.
Correct this so that the packing setting is retained whereever
possible.

Fixes "9d59e89 media: bcm2835-unicam: Re-fetch mbus code from subdev
on a g_fmt call"

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: change minimum number of vb2_queue buffers to 1

Since the unicam driver was modified to write to a dummy buffer when no
user-supplied buffer is available, it can now write to and return a
buffer even when there's only a single one. Enable this by changing the
min_buffers_needed in the vb2_queue; it will be useful for enabling
still captures without allocating more memory than absolutely necessary.

Signed-off-by: David Plowman <[email protected]>

staging: vc04_services: ISP: Add a more complex ISP processing component

Driver for the BCM2835 ISP hardware block.  This driver uses the MMAL
component to program the ISP hardware through the VC firmware.

The ISP component can produce two video stream outputs, and Bayer
image statistics. This can't be encompassed in a simple V4L2
M2M device, so create a new device that registers 4 video nodes.

This patch squashes all the development patches from the earlier
rpi-5.4.y branch into one

Signed-off-by: Naushir Patuck <[email protected]>

staging/bcm2835-isp: Add the unpacked (16bpp) raw formats

Now that the firmware supports the unpacked (16bpp) variants
of the MIPI raw formats, add the mappings.

Signed-off-by: Dave Stevenson <[email protected]>

staging/bcm2835-isp: Log the number of excess supported formats

When logging that the firmware has provided more supported formats
than we had allocated storage for, log the number allocated and
returned.

Signed-off-by: Dave Stevenson <[email protected]>

staging: vc04_services: ISP: Add colour denoise control

Add colour denoise control to the bcm2835 driver through a new v4l2
control: V4L2_CID_USER_BCM2835_ISP_CDN.

Add the accompanying MMAL configuration structure definitions as well.

Signed-off-by: Naushir Patuck <[email protected]>

bcm2835-isp: Allow formats with different colour spaces.

Each supported format now includes a mask showing the allowed colour
spaces, as well as a default colour space for when one was not
specified.

Additionally we translate the colour space to mmal format and pass it
over to the VideoCore.

Signed-off-by: David Plowman <[email protected]>

media: i2c: add ov9281 driver.

Change-Id: I7b77250bbc56d2f861450cf77271ad15f9b88ab1
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: fix mclk issue when probe multiple camera.

Takes the ov9281 part only from the Rockchip's patch.

Change-Id: I30e833baf2c1bb07d6d87ddb3b00759ab45a90e4
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: add enum_frame_interval function for iq tool 2.2 and hal3

Adds the ov9281 parts of the Rockchip patch adding enum_frame_interval to
a large number of drivers.

Change-Id: I03344cd6cf278dd7c18fce8e97479089ef185a5c
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: Fixup for recent kernel releases, and remove custom code

The Rockchip driver was based on a 4.4 kernel, and had several custom
Rockchip parts.

Update to 5.4 kernel APIs, with the relevant controls required by
libcamera, and remove custom Rockchip parts.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Read chip ID via 2 reads

Vision Components have made an OV9281 module which blocks reading
back the majority of registers to comply with NDAs, and in doing
so doesn't allow auto-increment register reading as used when
reading the chip ID.

Use two reads and manually combine the results.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Add support for 8 bit readout

The sensor supports 8 bit mode as well as 10bit, so add the
relevant code to allow selection of this.

Signed-off-by: Dave Stevenson <[email protected]>

media: ov9281: Add 1280x720 and 640x480 modes

Breaks out common register set and adds the different registers
for 1280x720 (cropped) and 640x480 (skipped) modes

Signed-off-by: Dave Stevenson <[email protected]>

Fixed picture line bug in all ov9281 modes

Signed-off-by: Mathias Anhalt <[email protected]>

Added hflip and vflip controls to ov9281

Signed-off-by: Mathias Anhalt <[email protected]>

media: i2c: ov9281: Remove override of subdev name

From the original Rockchip driver, the subdev was renamed
from the default to being "mov9281 <dev_name>" whereas the
default would have been "ov9281 <dev_name>".

Remove the override to drop back to the default rather than
a vendor custom string.

Signed-off-by: Dave Stevenson <[email protected]>

media: v4l2-subdev: add subdev-wide state struct

Signed-off-by: Dom Cobley <[email protected]>

media: i2c: ov9281: Add fwnode properties controls

Add call to v4l2_ctrl_new_fwnode_properties to read and
create the fwnode based controls.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Sensor should report RAW color space

Tested on Raspberry Pi running libcamera.

Signed-off-by: David Plowman <[email protected]>

Partial revert "media: i2c: add ov9281 driver."

This partially reverts commit 84e98e3a4f3eecb168ceb80231c3e8252929892e.

The commit had merged some changes to other drivers with adding the ov9281
driver. Only the ov9281 parts have been reverted.

staging/bcm2835-isp: Fix compiler warning

The result of dividing a u32 by a size_t is an unsigned int on arm32
and a long unsigned int on arm64. Use "%zu" (the size_t format) to
remove the build warning for 64-bit builds.

Signed-off-by: Phil Elwell <[email protected]>

staging: vc04_services: isp: Set the YUV420/YVU420 format stride to 64 bytes

The bcm2835 ISP requires the base address of all input/output planes to have 32
byte alignment. Using a Y stride of 32 bytes would not guarantee that the V
plane would fulfil this, e.g. a height of 650 lines would mean the V plane
buffer is not 32 byte aligned for YUV420 formats.

Having a Y stride of 64 bytes would ensure both U and V planes have a 32 byte
alignment, as the luma height will always be an even number of lines.

Signed-off-by: Naushir Patuck <[email protected]>

vc04_services: isp: Report input node as wanting full range RAW color space

RAW color spaces are more usually reported as having full range
quantization.

Tested using libcamera.

Signed-off-by: David Plowman <[email protected]>

drivers: bcm2835_isp: Allow multiple users for the ISP driver.

Add a second (identical) set of device nodes to allow concurrent use of the ISP
hardware by another user. This change effectively creates a second state
structure (struct bcm2835_isp_dev) to maintain independent state for the second
user. Node and media entity names are appened with the instance index
appropriately.

Further users can be added by changing the BCM2835_ISP_NUM_INSTANCES define.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: bcm2835_isp: Fix div by 0 bug.

Fix a possible division by 0 bug when setting up the mmal port for the stats
port.

Signed-off-by: Naushir Patuck <[email protected]>

staging/bcm2835-isp: Fix cleanup after init fail

bcm2835_isp_remove is called on an initialisation failure, but at that
point the drvdata hasn't been set. This causes a crash when e.g. using
the cutdown firmware (gpu_mem=16).

Move platform_set_drvdata before the instance probing loop to avoid the
problem.

See: raspberrypi/linux#4774

Signed-off-by: Phil Elwell <[email protected]>

bcm2835-v4l2-isp: Add missing lock initialization

ISP device allocation is dynamic hence the locks too.
struct mutex queue_lock is not initialized which result in bug.

Fixing same by initializing it.

[   29.847138] INFO: trying to register non-static key.
[   29.847156] The code is fine but needs lockdep annotation, or maybe
[   29.847159] you didn't initialize this object before use?
[   29.847161] turning off the locking correctness validator.
[   29.847167] CPU: 1 PID: 343 Comm: v4l_id Tainted: G         C        5.15.11-rt24-v8+ torvalds#8
[   29.847187] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[   29.847194] Call trace:
[   29.847197]  dump_backtrace+0x0/0x1b8
[   29.847227]  show_stack+0x20/0x30
[   29.847240]  dump_stack_lvl+0x8c/0xb8
[   29.847254]  dump_stack+0x18/0x34
[   29.847263]  register_lock_class+0x494/0x4a0
[   29.847278]  __lock_acquire+0x80/0x1680
[   29.847289]  lock_acquire+0x214/0x3a0
[   29.847300]  mutex_lock_nested+0x70/0xc8
[   29.847312]  _vb2_fop_release+0x3c/0xa8 [videobuf2_v4l2]
[   29.847346]  vb2_fop_release+0x34/0x60 [videobuf2_v4l2]
[   29.847367]  v4l2_release+0xc8/0x108 [videodev]
[   29.847453]  __fput+0x8c/0x258
[   29.847476]  ____fput+0x18/0x28
[   29.847487]  task_work_run+0x98/0x180
[   29.847502]  do_notify_resume+0x228/0x3f8
[   29.847515]  el0_svc+0xec/0xf0
[   29.847523]  el0t_64_sync_handler+0x90/0xb8
[   29.847531]  el0t_64_sync+0x180/0x184

Signed-off-by: Padmanabha Srinivasaiah <[email protected]>

staging: vc04_services: isp: Permit all sRGB colour spaces on ISP outputs

ISP outputs actually support all colour spaces that are fundamentally
sRGB underneath, regardless of whether an RGB or YUV output format is
actually requested.

Signed-off-by: David Plowman <[email protected]>

drivers: staging: bcm2835-isp: Do not cleanup mmal vcsm buffer on stop_streaming

On stop_streaming() the vcsm buffer handle gets released by the buffer cleanup
code.  This will subsequently cause and error if userland re-queues the same
buffer on the next start_streaming() call.

Remove this cleanup code and rely on the vb2_ops->buf_cleanup() call to do the
cleanups instead.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: staging: bcm2835-isp: Clear LS table handle in the firmware

When all nodes have stopped streaming, ensure the firmware has released its
handle on the LS table dmabuf. This is done by passing a null handle in the
LS params.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: staging: bcm2835-isp: Respect caller's stride value

The stride value reported for output image buffers should be at least
as large as any value that was passed in by the caller (subject to
correct alignment for the pixel format). If the value is zero (meaning
no value was passed), or is too small, the minimum acceptable value
will be substituted.

Signed-off-by: David Plowman <[email protected]>

staging: vc04_services: bcm2835-isp: Drop include Makefile directive

Drop the include directive. They can break the build, when one only
wants to build a subdirectory. Replace with "../" for the includes in
the bcm2835-isp instead.

The fix is equivalent to the four patches between 29d49a7
("staging: vc04_services: bcm2835-audio: Drop include Makefile
directive")...2529ca2 ("staging: vc04_services: interface: Drop
include Makefile directive")

Fixes: c8f89c9551c1 ("staging: vc04_services: ISP: Add a more complex ISP processing component")
Suggested-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Kieran Bingham <[email protected]>

staging: vc04_services: bcm2835-v4l2-isp: Register with vchiq_bus_type

Register the bcm2835-v4l2-isp driver with the vchiq_bus_type instead of
using the platform driver/device.

Signed-off-by: Kieran Bingham <[email protected]>

staging: vc04_services: bcm2835-v4l2-isp: Explicitly set DMA mask

The platform model originally handled the DMA mask. Now that
we are on the vchiq_bus we need to explicitly set this.

Signed-off-by: Kieran Bingham <[email protected]>

drivers: media: bcm2835_isp: Cache LS table dmabuf

Clients such as libcamera do not change the LS table dmabuf on every
frame. In such cases instead of mapping/remapping the same dmabuf on
every frame to send to the firmware, cache the dmabuf once and only
update and remap if the dmabuf has been changed by the userland client.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Correctly handle error propagation for stream on

On a failure in start_streaming(), the error code would not propagate to
the calling function on all conditions. This would cause the userland
caller to not know of the failure.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Return early from stop_streaming() if stopped

clk_disable_unprepare() is called unconditionally in stop_streaming().
This is incorrect in the cases where start_streaming() fails, and
unprepares all clocks as part of the failure cleanup. To avoid this,
ensure that clk_disable_unprepare() is only called in stop_streaming()
if the clocks are in a prepared state.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Clear clock state when stopping streaming

Commit 65e08c465020d4c5b51afb452efc2246d80fd66f failed to clear the
clock state when the device stopped streaming. Fix this, as it might
again cause the same problems when doing an unprepare.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Fix bug in buffer swapping logic

If multiple sets of interrupts occur simultaneously, it may be unsafe
to swap buffers, as the hardware may already be re-using the current
buffers. In such cases, avoid swapping buffers, and wait for the next
opportunity at the Frame End interrupt to signal completion.

Additionally, check the packet compare status when watching for frame
end for buffers swaps, as this could also signify a frame end event.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Forward input status from subdevice

The vidioc_enum_input() v4l2 ioctl is capable of returning
sensor/input status as well. This is used in current
GStreamer HEAD for signal detection [1].

bcm2835-unicam does handle this syscall, but it didn't ask
the subdevice driver about the input status. The input then
appeared as always present.

This commit adds the necessary query. There is a precedent for
this - the R-Car VIN V4L2 driver does a similar call [2].

[1]: https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/-/blob/ce0be27caf69aa9d96b73bc2b50737451b6f6936/sys/v4l2/gstv4l2src.c#L553
[2]: https://github.com/raspberrypi/linux/blob/7fb9d006d3ff3baf2e205e0c85c4e4fd0a64fcd0/drivers/media/platform/rcar-vin/rcar-v4l2.c#L548

Signed-off-by: Jakub Vaněk <[email protected]>

media/bcm2835-unicam: Parse pad numbers correctly

The driver was making big assumptions about the source device
using pad 0 and 1, which doesn't follow for more complex
devices where Unicam's source device may be a sink device for
something else.

Read the pad numbers through media controller, and reference
them appropriately.

Signed-off-by: Dave Stevenson <[email protected]>

media/bcm2835-unicam: Add support for configuration via MC API

Adds Media Controller API support for more complex pipelines.
libcamera is about to switch to using this mechanism for configuring
sensors.

This can be enabled by either a module parameter, or device tree.

Various functions have been moved to group video-centric and
mc-centric functions together.

Based on a similar conversion done to ti-vpe.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Fixup for 5.18 and new get_mbus_config struct

The number of active CSI2 data lanes has moved within the struct
v4l2_mbus_config used by the get_mbus_config API call.
Update the driver to match the changes in mainline.

Signed-off-by: Dave Stevenson <[email protected]>

drivers: bcm2835_unicam: Add logging message when a frame is dropped.

If a dummy buffer is still active on a frame start, it indicates that this frame
will be dropped. The explicit logging helps users identify performance issues.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: bcm2835_unicam: Disable trigger mode operation

On a Pi3 B/B+ platform the imx219 sensor frequently generates a single corrupt
frame when the sensor first starts. This can either be a missing line, or
invalid samples within the line. This only occurrs using the Unicam kernel
driver.

Disabling trigger mode elimiates this corruption. Since trigger mode is a
legacy feature copied from the firmware driver and not expected to be needed,
remove it. Tested on the Raspberry Pi cameras and shows no ill effects.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Set ret on error path in unicam_async_complete()

Clang warns:

  drivers/media/platform/bcm2835/bcm2835-unicam.c:3109:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
          if (!source_pads) {
              ^~~~~~~~~~~~
  drivers/media/platform/bcm2835/bcm2835-unicam.c:3152:9: note: uninitialized use occurs here
          return ret;
                 ^~~
  drivers/media/platform/bcm2835/bcm2835-unicam.c:3109:2: note: remove the 'if' if its condition is always false
          if (!source_pads) {
          ^~~~~~~~~~~~~~~~~~~
  drivers/media/platform/bcm2835/bcm2835-unicam.c:3091:9: note: initialize the variable 'ret' to silence this warning
          int ret;
                 ^
                  = 0
  1 warning generated.

When the if condition is true, ret will be used uninitialized, which
could result in undesirable behavior. Set ret to -ENODEV on the error
path, which is a standard error code for the ->complete() callback.

Fixes: d056e86eb35f ("media/bcm2835-unicam: Parse pad numbers correctly")
Signed-off-by: Nathan Chancellor <[email protected]>

media: bcm2835-unicam: Handle a repeated frame start with no end

In the case of 2 frame starts being received with no frame end
between, the queued buffer held in next_frm was lost as the
pointer was overwritten with the dummy buffer.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Correctly handle FS + FE ISR condtion

If we get a simultaneous FS + FE interrupt for the same frame, it cannot be
marked as completed and returned to userland as the framebuffer will be refilled
by Unicam on the next sensor frame. Additionally, the timestamp will be set to 0
as the FS interrupt handling code will not have run yet.

To avoid these problems, the frame is considered dropped in the FE handler,
and will be returned to userland on the subsequent sensor frame.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Fix for possible dummy buffer overrun

The Unicam hardware has been observed to cause a buffer overrun when using the
dummy buffer as a circular buffer. The conditions that cause the overrun are not
fully known, but it seems to occur when the memory bus is heavily loaded.

To avoid the overrun, program the hardware with a buffer size of 0 when using
the dummy buffer. This will cause overrun into the allocated dummy buffer, but
avoid out of bounds writes.

Signed-off-by: Naushir Patuck <[email protected]>

media: bcm2835-unicam: Fix up start/stop api change

Signed-off-by: Dom Cobley <[email protected]>

media: bcm2835-unicam: Use mipi-csi2.h header for data type values

The MIPI CSI2 data type ID values are now defined in the
mipi-csi2.h header, so use those defines instead of hard
coding them in the driver.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Add support for RAW16 formats

With the RAW16 formats now having a defined CSI2 data type ID,
they can be added to the driver.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Start and stop media_pipeline with same node

media_pipeline_start and media_pipeline_stop now validate that
the pipeline is being started and stopped with the same pipe
and pad handles.
When running with embedded metadata (eg imx477 and imx708), the
start typically happens from the metadata pad, whilst stop is
always from the image pad.

Always pass the image pad to media_pipeline_start to ensure
that the calls are balanced.

Signed-off-by: Dave Stevenson <[email protected]>

drivers: media: bcm2835_unicam: Improve frame sequence count handling

Ensure that the frame sequence counter is incremented only if a previous
frame start interrupt has occurred, or a frame start + frame end has
occurred simultaneously.

This corresponds the sequence number with the actual number of frames
produced by the sensor, not the number of frame buffers dequeued back
to userland.

Signed-off-by: Naushir Patuck <[email protected]>

bcm2835-unicam: hacks to allow it to build

media: bcm2835-unicam: Fix up async notifier usage

Fixes "8a090fc3e549 bcm2835-unicam: hacks to allow it to build"

media: bcm2835-unicam: Add option for a GPIO to reflect FS/FE timing

The legacy stack had an option to have a GPIO track frame start and
end events to give basic synchronisation to the incoming image stream.
https://forums.raspberrypi.com/viewtopic.php?t=190314

Replicate this in the kernel Unicam driver.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Add support for 12bit mono packed format

Now that V4L2_PIX_FMT_Y12P is defined, allow passing raw 12bit
mono packed data through the peripheral.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Add support for 14bit mono sources

Now that V4L2_PIX_FMT_Y14 and V4L2_PIX_FMT_Y14P are defined,
allow passing 14bit mono data through the peripheral.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Add support for unpacked 14bit Bayer formats

Now that the 14bit non-packed Bayer formats are defined, add them
into the supported formats lookup table.

Signed-off-by: Dave Stevenson <[email protected]>

media: bcm2835-unicam: Reinstate old downstream driver as legacy

Whilst the Unicam driver has now been upstreamed it only supports
configuration via Media Controller (not driven from the /dev/videoN
node), which makes life significantly harder for simple devices such
as mono sensors, and HDMI or analogue video to CSI2 bridge chips
(eg TC358743 and ADV7282M).

Fix up the downstream driver so that it builds, reinstate the links
from Kconfig and Makefile to it, and give it a new Kconfig name
(VIDEO_BCM2835_UNICAM_LEGACY).

Signed-off-by: Dave Stevenson <[email protected]>
evlaV-bot pushed a commit to evlaV/linux-integration that referenced this pull request Sep 24, 2025
Driver for the BCM2835 ISP hardware block.  This driver uses the MMAL
component to program the ISP hardware through the VC firmware.

The ISP component can produce two video stream outputs, and Bayer
image statistics. This can't be encompassed in a simple V4L2
M2M device, so create a new device that registers 4 video nodes.

This patch squashes all the development patches from the earlier
rpi-5.4.y branch into one

Signed-off-by: Naushir Patuck <[email protected]>

staging/bcm2835-isp: Add the unpacked (16bpp) raw formats

Now that the firmware supports the unpacked (16bpp) variants
of the MIPI raw formats, add the mappings.

Signed-off-by: Dave Stevenson <[email protected]>

staging/bcm2835-isp: Log the number of excess supported formats

When logging that the firmware has provided more supported formats
than we had allocated storage for, log the number allocated and
returned.

Signed-off-by: Dave Stevenson <[email protected]>

staging: vc04_services: ISP: Add colour denoise control

Add colour denoise control to the bcm2835 driver through a new v4l2
control: V4L2_CID_USER_BCM2835_ISP_CDN.

Add the accompanying MMAL configuration structure definitions as well.

Signed-off-by: Naushir Patuck <[email protected]>

bcm2835-isp: Allow formats with different colour spaces.

Each supported format now includes a mask showing the allowed colour
spaces, as well as a default colour space for when one was not
specified.

Additionally we translate the colour space to mmal format and pass it
over to the VideoCore.

Signed-off-by: David Plowman <[email protected]>

media: i2c: add ov9281 driver.

Change-Id: I7b77250bbc56d2f861450cf77271ad15f9b88ab1
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: fix mclk issue when probe multiple camera.

Takes the ov9281 part only from the Rockchip's patch.

Change-Id: I30e833baf2c1bb07d6d87ddb3b00759ab45a90e4
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: add enum_frame_interval function for iq tool 2.2 and hal3

Adds the ov9281 parts of the Rockchip patch adding enum_frame_interval to
a large number of drivers.

Change-Id: I03344cd6cf278dd7c18fce8e97479089ef185a5c
Signed-off-by: Zefa Chen <[email protected]>

media: i2c: ov9281: Fixup for recent kernel releases, and remove custom code

The Rockchip driver was based on a 4.4 kernel, and had several custom
Rockchip parts.

Update to 5.4 kernel APIs, with the relevant controls required by
libcamera, and remove custom Rockchip parts.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Read chip ID via 2 reads

Vision Components have made an OV9281 module which blocks reading
back the majority of registers to comply with NDAs, and in doing
so doesn't allow auto-increment register reading as used when
reading the chip ID.

Use two reads and manually combine the results.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Add support for 8 bit readout

The sensor supports 8 bit mode as well as 10bit, so add the
relevant code to allow selection of this.

Signed-off-by: Dave Stevenson <[email protected]>

media: ov9281: Add 1280x720 and 640x480 modes

Breaks out common register set and adds the different registers
for 1280x720 (cropped) and 640x480 (skipped) modes

Signed-off-by: Dave Stevenson <[email protected]>

Fixed picture line bug in all ov9281 modes

Signed-off-by: Mathias Anhalt <[email protected]>

Added hflip and vflip controls to ov9281

Signed-off-by: Mathias Anhalt <[email protected]>

media: i2c: ov9281: Remove override of subdev name

From the original Rockchip driver, the subdev was renamed
from the default to being "mov9281 <dev_name>" whereas the
default would have been "ov9281 <dev_name>".

Remove the override to drop back to the default rather than
a vendor custom string.

Signed-off-by: Dave Stevenson <[email protected]>

media: v4l2-subdev: add subdev-wide state struct

Signed-off-by: Dom Cobley <[email protected]>

media: i2c: ov9281: Add fwnode properties controls

Add call to v4l2_ctrl_new_fwnode_properties to read and
create the fwnode based controls.

Signed-off-by: Dave Stevenson <[email protected]>

media: i2c: ov9281: Sensor should report RAW color space

Tested on Raspberry Pi running libcamera.

Signed-off-by: David Plowman <[email protected]>

Partial revert "media: i2c: add ov9281 driver."

This partially reverts commit 84e98e3a4f3eecb168ceb80231c3e8252929892e.

The commit had merged some changes to other drivers with adding the ov9281
driver. Only the ov9281 parts have been reverted.

staging/bcm2835-isp: Fix compiler warning

The result of dividing a u32 by a size_t is an unsigned int on arm32
and a long unsigned int on arm64. Use "%zu" (the size_t format) to
remove the build warning for 64-bit builds.

Signed-off-by: Phil Elwell <[email protected]>

staging: vc04_services: isp: Set the YUV420/YVU420 format stride to 64 bytes

The bcm2835 ISP requires the base address of all input/output planes to have 32
byte alignment. Using a Y stride of 32 bytes would not guarantee that the V
plane would fulfil this, e.g. a height of 650 lines would mean the V plane
buffer is not 32 byte aligned for YUV420 formats.

Having a Y stride of 64 bytes would ensure both U and V planes have a 32 byte
alignment, as the luma height will always be an even number of lines.

Signed-off-by: Naushir Patuck <[email protected]>

vc04_services: isp: Report input node as wanting full range RAW color space

RAW color spaces are more usually reported as having full range
quantization.

Tested using libcamera.

Signed-off-by: David Plowman <[email protected]>

drivers: bcm2835_isp: Allow multiple users for the ISP driver.

Add a second (identical) set of device nodes to allow concurrent use of the ISP
hardware by another user. This change effectively creates a second state
structure (struct bcm2835_isp_dev) to maintain independent state for the second
user. Node and media entity names are appened with the instance index
appropriately.

Further users can be added by changing the BCM2835_ISP_NUM_INSTANCES define.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: bcm2835_isp: Fix div by 0 bug.

Fix a possible division by 0 bug when setting up the mmal port for the stats
port.

Signed-off-by: Naushir Patuck <[email protected]>

staging/bcm2835-isp: Fix cleanup after init fail

bcm2835_isp_remove is called on an initialisation failure, but at that
point the drvdata hasn't been set. This causes a crash when e.g. using
the cutdown firmware (gpu_mem=16).

Move platform_set_drvdata before the instance probing loop to avoid the
problem.

See: raspberrypi/linux#4774

Signed-off-by: Phil Elwell <[email protected]>

bcm2835-v4l2-isp: Add missing lock initialization

ISP device allocation is dynamic hence the locks too.
struct mutex queue_lock is not initialized which result in bug.

Fixing same by initializing it.

[   29.847138] INFO: trying to register non-static key.
[   29.847156] The code is fine but needs lockdep annotation, or maybe
[   29.847159] you didn't initialize this object before use?
[   29.847161] turning off the locking correctness validator.
[   29.847167] CPU: 1 PID: 343 Comm: v4l_id Tainted: G         C        5.15.11-rt24-v8+ torvalds#8
[   29.847187] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[   29.847194] Call trace:
[   29.847197]  dump_backtrace+0x0/0x1b8
[   29.847227]  show_stack+0x20/0x30
[   29.847240]  dump_stack_lvl+0x8c/0xb8
[   29.847254]  dump_stack+0x18/0x34
[   29.847263]  register_lock_class+0x494/0x4a0
[   29.847278]  __lock_acquire+0x80/0x1680
[   29.847289]  lock_acquire+0x214/0x3a0
[   29.847300]  mutex_lock_nested+0x70/0xc8
[   29.847312]  _vb2_fop_release+0x3c/0xa8 [videobuf2_v4l2]
[   29.847346]  vb2_fop_release+0x34/0x60 [videobuf2_v4l2]
[   29.847367]  v4l2_release+0xc8/0x108 [videodev]
[   29.847453]  __fput+0x8c/0x258
[   29.847476]  ____fput+0x18/0x28
[   29.847487]  task_work_run+0x98/0x180
[   29.847502]  do_notify_resume+0x228/0x3f8
[   29.847515]  el0_svc+0xec/0xf0
[   29.847523]  el0t_64_sync_handler+0x90/0xb8
[   29.847531]  el0t_64_sync+0x180/0x184

Signed-off-by: Padmanabha Srinivasaiah <[email protected]>

staging: vc04_services: isp: Permit all sRGB colour spaces on ISP outputs

ISP outputs actually support all colour spaces that are fundamentally
sRGB underneath, regardless of whether an RGB or YUV output format is
actually requested.

Signed-off-by: David Plowman <[email protected]>

drivers: staging: bcm2835-isp: Do not cleanup mmal vcsm buffer on stop_streaming

On stop_streaming() the vcsm buffer handle gets released by the buffer cleanup
code.  This will subsequently cause and error if userland re-queues the same
buffer on the next start_streaming() call.

Remove this cleanup code and rely on the vb2_ops->buf_cleanup() call to do the
cleanups instead.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: staging: bcm2835-isp: Clear LS table handle in the firmware

When all nodes have stopped streaming, ensure the firmware has released its
handle on the LS table dmabuf. This is done by passing a null handle in the
LS params.

Signed-off-by: Naushir Patuck <[email protected]>

drivers: staging: bcm2835-isp: Respect caller's stride value

The stride value reported for output image buffers should be at least
as large as any value that was passed in by the caller (subject to
correct alignment for the pixel format). If the value is zero (meaning
no value was passed), or is too small, the minimum acceptable value
will be substituted.

Signed-off-by: David Plowman <[email protected]>

staging: vc04_services: bcm2835-isp: Drop include Makefile directive

Drop the include directive. They can break the build, when one only
wants to build a subdirectory. Replace with "../" for the includes in
the bcm2835-isp instead.

The fix is equivalent to the four patches between 29d49a7
("staging: vc04_services: bcm2835-audio: Drop include Makefile
directive")...2529ca2 ("staging: vc04_services: interface: Drop
include Makefile directive")

Fixes: c8f89c9551c1 ("staging: vc04_services: ISP: Add a more complex ISP processing component")
Suggested-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Kieran Bingham <[email protected]>

staging: vc04_services: bcm2835-v4l2-isp: Register with vchiq_bus_type

Register the bcm2835-v4l2-isp driver with the vchiq_bus_type instead of
using the platform driver/device.

Signed-off-by: Kieran Bingham <[email protected]>

staging: vc04_services: bcm2835-v4l2-isp: Explicitly set DMA mask

The platform model originally handled the DMA mask. Now that
we are on the vchiq_bus we need to explicitly set this.

Signed-off-by: Kieran Bingham <[email protected]>

drivers: media: bcm2835_isp: Cache LS table dmabuf

Clients such as libcamera do not change the LS table dmabuf on every
frame. In such cases instead of mapping/remapping the same dmabuf on
every frame to send to the firmware, cache the dmabuf once and only
update and remap if the dmabuf has been changed by the userland client.

Signed-off-by: Naushir Patuck <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 24, 2025
 - treat tailcall count as 32-bit for access and update
 - change out_offset scope from file to function
 - minor format/structure changes for consistency

Testing: (skipping fentry, fexit, freplace)
========

root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls
test_bpf: #0 Tail call leaf jited:1 967 PASS
test_bpf: #1 Tail call 2 jited:1 1427 PASS
test_bpf: #2 Tail call 3 jited:1 2373 PASS
test_bpf: #3 Tail call 4 jited:1 2304 PASS
test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS
test_bpf: #5 Tail call load/store jited:1 2249 PASS
test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS
test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS
test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS
test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31
397/1   tailcalls/tailcall_1:OK
397/2   tailcalls/tailcall_2:OK
397/3   tailcalls/tailcall_3:OK
397/4   tailcalls/tailcall_4:OK
397/5   tailcalls/tailcall_5:OK
397/6   tailcalls/tailcall_6:OK
397/7   tailcalls/tailcall_bpf2bpf_1:OK
397/8   tailcalls/tailcall_bpf2bpf_2:OK
397/9   tailcalls/tailcall_bpf2bpf_3:OK
397/10  tailcalls/tailcall_bpf2bpf_4:OK
397/11  tailcalls/tailcall_bpf2bpf_5:OK
397/12  tailcalls/tailcall_bpf2bpf_6:OK
397/17  tailcalls/tailcall_poke:OK
397/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
397/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
397/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
397/27  tailcalls/tailcall_failure:OK
397/28  tailcalls/reject_tail_call_spin_lock:OK
397/29  tailcalls/reject_tail_call_rcu_lock:OK
397/30  tailcalls/reject_tail_call_preempt_lock:OK
397/31  tailcalls/reject_tail_call_ref:OK
397     tailcalls:OK
Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 24, 2025
Update current JIT offsets to enable and set up BPF line info for better
introspection and debugging. Offsets map each xlated insn to the start of
its JITed code, as well as the epilogue. This also allows simplifying some
code and dropping unneeded JIT ctx variables.

Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly,
before this change we see:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

While afterwards we have:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
; struct seq_file *seq = ctx->meta->seq;
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
; struct seq_file *seq = ctx->meta->seq;
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
; struct udp_sock *udp_sk = ctx->udp_sk;
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

which aligns with the original source code.

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 24, 2025
With test_progs compiled with ARM THUMB insns, trigger_func*() used for
testing uprobes have odd (LSB set) entry points due to interworking. This
fails alignment check first, then due to uprobe unsupported for THUMB in
prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn().

First fix by setting trigger_func*() as ARM insns and rely on interworking
enabled by default. Doesn't work on Debian bookworm target because insn #1
of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched
by dynamic loader? Kernel somehow?

__attribute__((target("arm")))

Works to compile all test_progs with gcc -marm, or just attach_probe.c.

Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid
using THUMB insns, with expected tests passing:

torvalds#8/1     attach_probe/manual-default:OK
torvalds#8/2     attach_probe/manual-legacy:OK
torvalds#8/3     attach_probe/manual-perf:OK
torvalds#8/4     attach_probe/manual-link:OK
torvalds#8/5     attach_probe/auto:OK
torvalds#8/6     attach_probe/kprobe-sleepable:OK
torvalds#8/8     attach_probe/uprobe-sleepable:OK
torvalds#8/9     attach_probe/uprobe-ref_ctr:OK
torvalds#8/10    attach_probe/uprobe-long_name:OK
torvalds#8/11    attach_probe/kprobe-long_name:OK
torvalds#8       attach_probe:OK

Note that attach_probe/uprobe-lib will still fail since it tries to attach
a uprobe to the system libc.so, which is compiled for THUMB:

test_attach_probe:PASS:skel_open 0 nsec
test_attach_probe:PASS:skel_load 0 nsec
test_attach_probe:PASS:check_bss 0 nsec
test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec
libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL
test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22
torvalds#8/7     attach_probe/uprobe-lib:FAIL
torvalds#8       attach_probe:FAIL

Also compile other source files and binary with same issue using "-marm":

bpf_cookie.c
fill_link_info.c
task_pt_regs.test.c
uprobe_autoattach.c (also tries libc.so attach)
usdt.c
uprobe_multi

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 25, 2025
 - treat tailcall count as 32-bit for access and update
 - change out_offset scope from file to function
 - minor format/structure changes for consistency

Testing: (skipping fentry, fexit, freplace)
========

root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls
test_bpf: #0 Tail call leaf jited:1 967 PASS
test_bpf: #1 Tail call 2 jited:1 1427 PASS
test_bpf: #2 Tail call 3 jited:1 2373 PASS
test_bpf: #3 Tail call 4 jited:1 2304 PASS
test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS
test_bpf: #5 Tail call load/store jited:1 2249 PASS
test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS
test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS
test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS
test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31
397/1   tailcalls/tailcall_1:OK
397/2   tailcalls/tailcall_2:OK
397/3   tailcalls/tailcall_3:OK
397/4   tailcalls/tailcall_4:OK
397/5   tailcalls/tailcall_5:OK
397/6   tailcalls/tailcall_6:OK
397/7   tailcalls/tailcall_bpf2bpf_1:OK
397/8   tailcalls/tailcall_bpf2bpf_2:OK
397/9   tailcalls/tailcall_bpf2bpf_3:OK
397/10  tailcalls/tailcall_bpf2bpf_4:OK
397/11  tailcalls/tailcall_bpf2bpf_5:OK
397/12  tailcalls/tailcall_bpf2bpf_6:OK
397/17  tailcalls/tailcall_poke:OK
397/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
397/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
397/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
397/27  tailcalls/tailcall_failure:OK
397/28  tailcalls/reject_tail_call_spin_lock:OK
397/29  tailcalls/reject_tail_call_rcu_lock:OK
397/30  tailcalls/reject_tail_call_preempt_lock:OK
397/31  tailcalls/reject_tail_call_ref:OK
397     tailcalls:OK
Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 25, 2025
Update current JIT offsets to enable and set up BPF line info for better
introspection and debugging. Offsets map each xlated insn to the start of
its JITed code, as well as the epilogue. This also allows simplifying some
code and dropping unneeded JIT ctx variables.

Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly,
before this change we see:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

While afterwards we have:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
; struct seq_file *seq = ctx->meta->seq;
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
; struct seq_file *seq = ctx->meta->seq;
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
; struct udp_sock *udp_sk = ctx->udp_sk;
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

which aligns with the original source code.

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 25, 2025
With test_progs compiled with ARM THUMB insns, trigger_func*() used for
testing uprobes have odd (LSB set) entry points due to interworking. This
fails alignment check first, then due to uprobe unsupported for THUMB in
prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn().

First fix by setting trigger_func*() as ARM insns and rely on interworking
enabled by default. Doesn't work on Debian bookworm target because insn #1
of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched
by dynamic loader? Kernel somehow?

__attribute__((target("arm")))

Works to compile all test_progs with gcc -marm, or just attach_probe.c.

Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid
using THUMB insns, with expected tests passing:

torvalds#8/1     attach_probe/manual-default:OK
torvalds#8/2     attach_probe/manual-legacy:OK
torvalds#8/3     attach_probe/manual-perf:OK
torvalds#8/4     attach_probe/manual-link:OK
torvalds#8/5     attach_probe/auto:OK
torvalds#8/6     attach_probe/kprobe-sleepable:OK
torvalds#8/8     attach_probe/uprobe-sleepable:OK
torvalds#8/9     attach_probe/uprobe-ref_ctr:OK
torvalds#8/10    attach_probe/uprobe-long_name:OK
torvalds#8/11    attach_probe/kprobe-long_name:OK
torvalds#8       attach_probe:OK

Note that attach_probe/uprobe-lib will still fail since it tries to attach
a uprobe to the system libc.so, which is compiled for THUMB:

test_attach_probe:PASS:skel_open 0 nsec
test_attach_probe:PASS:skel_load 0 nsec
test_attach_probe:PASS:check_bss 0 nsec
test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec
libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL
test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22
torvalds#8/7     attach_probe/uprobe-lib:FAIL
torvalds#8       attach_probe:FAIL

Also compile other source files and binary with same issue using "-marm":

bpf_cookie.c
fill_link_info.c
task_pt_regs.test.c
uprobe_autoattach.c (also tries libc.so attach)
usdt.c
uprobe_multi

Signed-off-by: Tony Ambardar <[email protected]>
kuba-moo added a commit to linux-netdev/testing that referenced this pull request Sep 27, 2025
Petr Machata says:

====================
selftests: Mark auto-deferring functions clearly

selftests/net/lib.sh contains a suite of iproute2 wrappers that
automatically schedule the corresponding cleanup through defer. The fact
they do so is however not immediately obvious, one needs to know which
functions are handling the deferral behind the scenes, and which expect the
caller to handle cleanups themselves.

A convention for these auto-deferring functions would help both writing and
patch review. This patchset does so by marking these functions with an adf_
prefix. We already have a few such functions: forwarding/lib.sh has
adf_mcd_start() and a few selftests add private helpers that conform to
this convention.

Patches #1 to torvalds#8 gradually convert individual functions, one per patch.

Patch torvalds#9 renames an auto-deferring private helpers named dfr_* to adf_*.
The plan is not to retro-rename all private helpers, but I happened to know
about this one.

Patches torvalds#10 to torvalds#12 introduce several autodefer helpers for commonly used
forwarding/lib.sh functions, and opportunistically convert straightforward
instances of 'action; defer counteraction' to the new helpers.

Patch torvalds#13 adds some README verbiage to pitch defer and the adf_*
convention.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 27, 2025
 - treat tailcall count as 32-bit for access and update
 - change out_offset scope from file to function
 - minor format/structure changes for consistency

Testing: (skipping fentry, fexit, freplace)
========

root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls
test_bpf: #0 Tail call leaf jited:1 967 PASS
test_bpf: #1 Tail call 2 jited:1 1427 PASS
test_bpf: #2 Tail call 3 jited:1 2373 PASS
test_bpf: #3 Tail call 4 jited:1 2304 PASS
test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS
test_bpf: #5 Tail call load/store jited:1 2249 PASS
test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS
test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS
test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS
test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31
397/1   tailcalls/tailcall_1:OK
397/2   tailcalls/tailcall_2:OK
397/3   tailcalls/tailcall_3:OK
397/4   tailcalls/tailcall_4:OK
397/5   tailcalls/tailcall_5:OK
397/6   tailcalls/tailcall_6:OK
397/7   tailcalls/tailcall_bpf2bpf_1:OK
397/8   tailcalls/tailcall_bpf2bpf_2:OK
397/9   tailcalls/tailcall_bpf2bpf_3:OK
397/10  tailcalls/tailcall_bpf2bpf_4:OK
397/11  tailcalls/tailcall_bpf2bpf_5:OK
397/12  tailcalls/tailcall_bpf2bpf_6:OK
397/17  tailcalls/tailcall_poke:OK
397/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
397/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
397/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
397/27  tailcalls/tailcall_failure:OK
397/28  tailcalls/reject_tail_call_spin_lock:OK
397/29  tailcalls/reject_tail_call_rcu_lock:OK
397/30  tailcalls/reject_tail_call_preempt_lock:OK
397/31  tailcalls/reject_tail_call_ref:OK
397     tailcalls:OK
Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 27, 2025
Update current JIT offsets to enable and set up BPF line info for better
introspection and debugging. Offsets map each xlated insn to the start of
its JITed code, as well as the epilogue. This also allows simplifying some
code and dropping unneeded JIT ctx variables.

Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly,
before this change we see:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

While afterwards we have:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
; struct seq_file *seq = ctx->meta->seq;
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
; struct seq_file *seq = ctx->meta->seq;
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
; struct udp_sock *udp_sk = ctx->udp_sk;
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

which aligns with the original source code.

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 27, 2025
With test_progs compiled with ARM THUMB insns, trigger_func*() used for
testing uprobes have odd (LSB set) entry points due to interworking. This
fails alignment check first, then due to uprobe unsupported for THUMB in
prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn().

First fix by setting trigger_func*() as ARM insns and rely on interworking
enabled by default. Doesn't work on Debian bookworm target because insn #1
of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched
by dynamic loader? Kernel somehow?

__attribute__((target("arm")))

Works to compile all test_progs with gcc -marm, or just attach_probe.c.

Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid
using THUMB insns, with expected tests passing:

torvalds#8/1     attach_probe/manual-default:OK
torvalds#8/2     attach_probe/manual-legacy:OK
torvalds#8/3     attach_probe/manual-perf:OK
torvalds#8/4     attach_probe/manual-link:OK
torvalds#8/5     attach_probe/auto:OK
torvalds#8/6     attach_probe/kprobe-sleepable:OK
torvalds#8/8     attach_probe/uprobe-sleepable:OK
torvalds#8/9     attach_probe/uprobe-ref_ctr:OK
torvalds#8/10    attach_probe/uprobe-long_name:OK
torvalds#8/11    attach_probe/kprobe-long_name:OK
torvalds#8       attach_probe:OK

Note that attach_probe/uprobe-lib will still fail since it tries to attach
a uprobe to the system libc.so, which is compiled for THUMB:

test_attach_probe:PASS:skel_open 0 nsec
test_attach_probe:PASS:skel_load 0 nsec
test_attach_probe:PASS:check_bss 0 nsec
test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec
libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL
test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22
torvalds#8/7     attach_probe/uprobe-lib:FAIL
torvalds#8       attach_probe:FAIL

Also compile other source files and binary with same issue using "-marm":

bpf_cookie.c
fill_link_info.c
task_pt_regs.test.c
uprobe_autoattach.c (also tries libc.so attach)
usdt.c
uprobe_multi

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 27, 2025
 - treat tailcall count as 32-bit for access and update
 - change out_offset scope from file to function
 - minor format/structure changes for consistency

Testing: (skipping fentry, fexit, freplace)
========

root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls
test_bpf: #0 Tail call leaf jited:1 967 PASS
test_bpf: #1 Tail call 2 jited:1 1427 PASS
test_bpf: #2 Tail call 3 jited:1 2373 PASS
test_bpf: #3 Tail call 4 jited:1 2304 PASS
test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS
test_bpf: #5 Tail call load/store jited:1 2249 PASS
test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS
test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS
test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS
test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31
397/1   tailcalls/tailcall_1:OK
397/2   tailcalls/tailcall_2:OK
397/3   tailcalls/tailcall_3:OK
397/4   tailcalls/tailcall_4:OK
397/5   tailcalls/tailcall_5:OK
397/6   tailcalls/tailcall_6:OK
397/7   tailcalls/tailcall_bpf2bpf_1:OK
397/8   tailcalls/tailcall_bpf2bpf_2:OK
397/9   tailcalls/tailcall_bpf2bpf_3:OK
397/10  tailcalls/tailcall_bpf2bpf_4:OK
397/11  tailcalls/tailcall_bpf2bpf_5:OK
397/12  tailcalls/tailcall_bpf2bpf_6:OK
397/17  tailcalls/tailcall_poke:OK
397/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
397/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
397/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
397/27  tailcalls/tailcall_failure:OK
397/28  tailcalls/reject_tail_call_spin_lock:OK
397/29  tailcalls/reject_tail_call_rcu_lock:OK
397/30  tailcalls/reject_tail_call_preempt_lock:OK
397/31  tailcalls/reject_tail_call_ref:OK
397     tailcalls:OK
Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 27, 2025
Update current JIT offsets to enable and set up BPF line info for better
introspection and debugging. Offsets map each xlated insn to the start of
its JITed code, as well as the epilogue. This also allows simplifying some
code and dropping unneeded JIT ctx variables.

Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly,
before this change we see:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

While afterwards we have:

  48:   ldr     lr, [fp, #-44]  @ 0xffffffd4
  4c:   strd    r2, [lr, #-40]  @ 0xffffffd8
; struct seq_file *seq = ctx->meta->seq;
  50:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  54:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  58:   ldr     r2, [r8]
; struct seq_file *seq = ctx->meta->seq;
  5c:   mov     r7, r2
  60:   ldr     r2, [r7]
  64:   mov     r3, #0
; struct udp_sock *udp_sk = ctx->udp_sk;
  68:   ldr     r8, [fp, #-44]  @ 0xffffffd4
  6c:   ldr     r8, [r8, #-40]  @ 0xffffffd8
  70:   ldr     r6, [r8, torvalds#8]
  74:   ldr     r9, [fp, #-44]  @ 0xffffffd4
  78:   strd    r6, [r9, #-48]  @ 0xffffffd0

which aligns with the original source code.

Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/linux that referenced this pull request Sep 27, 2025
With test_progs compiled with ARM THUMB insns, trigger_func*() used for
testing uprobes have odd (LSB set) entry points due to interworking. This
fails alignment check first, then due to uprobe unsupported for THUMB in
prepare_uprobe() -> arch_uprobe_analyze_insn() -> arm_probes_decode_insn().

First fix by setting trigger_func*() as ARM insns and rely on interworking
enabled by default. Doesn't work on Debian bookworm target because insn #1
of funcs patched to 'UDF torvalds#25' which causes "Illegal instruction". Patched
by dynamic loader? Kernel somehow?

__attribute__((target("arm")))

Works to compile all test_progs with gcc -marm, or just attach_probe.c.

Change Makefile to compile prog_tests/attach_probe.c with '-marm' to avoid
using THUMB insns, with expected tests passing:

torvalds#8/1     attach_probe/manual-default:OK
torvalds#8/2     attach_probe/manual-legacy:OK
torvalds#8/3     attach_probe/manual-perf:OK
torvalds#8/4     attach_probe/manual-link:OK
torvalds#8/5     attach_probe/auto:OK
torvalds#8/6     attach_probe/kprobe-sleepable:OK
torvalds#8/8     attach_probe/uprobe-sleepable:OK
torvalds#8/9     attach_probe/uprobe-ref_ctr:OK
torvalds#8/10    attach_probe/uprobe-long_name:OK
torvalds#8/11    attach_probe/kprobe-long_name:OK
torvalds#8       attach_probe:OK

Note that attach_probe/uprobe-lib will still fail since it tries to attach
a uprobe to the system libc.so, which is compiled for THUMB:

test_attach_probe:PASS:skel_open 0 nsec
test_attach_probe:PASS:skel_load 0 nsec
test_attach_probe:PASS:check_bss 0 nsec
test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec
libbpf: prog 'handle_uprobe_byname2': failed to create uprobe '/lib/arm-linux-gnueabihf/libc.so.6:0x528e1' perf event: -EINVAL
test_uprobe_lib:FAIL:attach_uprobe_byname2 unexpected error: -22
torvalds#8/7     attach_probe/uprobe-lib:FAIL
torvalds#8       attach_probe:FAIL

Also compile other source files and binary with same issue using "-marm":

bpf_cookie.c
fill_link_info.c
task_pt_regs.test.c
uprobe_autoattach.c (also tries libc.so attach)
usdt.c
uprobe_multi

Signed-off-by: Tony Ambardar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants