Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

crawfxrd
Copy link
Member

The patch train at least applies cleanly, so let's see if it fixes it on 5.11 as well.

@crawfxrd crawfxrd changed the title Fix amdgpu with S0ix 5.11: Fix amdgpu with S0ix Aug 16, 2021
@crawfxrd crawfxrd marked this pull request as ready for review August 17, 2021 12:59
@crawfxrd crawfxrd requested review from a team August 17, 2021 12:59
@crawfxrd
Copy link
Member Author

crawfxrd commented Aug 17, 2021

pang11 suspends fine, but resets when it should resume.

The one time it "worked", I got this:

kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 8 PID: 2684 at kernel/irq/chip.c:210 irq_startup+0xef/0x100
kernel: Modules linked in: ccm rfcomm cmac algif_hash algif_skcipher af_alg bnep snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd>
kernel:  system76_acpi(OE) i915 hid_generic amdgpu iommu_v2 gpu_sched i2c_algo_bit drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect s>
kernel: CPU: 8 PID: 2684 Comm: systemd-sleep Tainted: G           OE     5.13.0-7614-generic #14
kernel: Hardware name: System76 Pangolin/Pangolin, BIOS 1.07.07 10/15/2020
kernel: RIP: 0010:irq_startup+0xef/0x100
kernel: Code: f6 4c 89 ef e8 e2 3e 00 00 85 c0 75 21 4c 89 ef 31 d2 4c 89 f6 e8 b1 c6 ff ff 4c 89 e7 e8 a9 fe ff ff 41 89 c5 e9 4c ff ff ff <0>
kernel: RSP: 0018:ffffb3e6427cbba0 EFLAGS: 00010002
kernel: RAX: 0000000000000010 RBX: 0000000000000001 RCX: 0000000000000040
kernel: RDX: ffffffffffffffff RSI: ffffffff83d60620 RDI: ffff9ab701080318
kernel: RBP: ffffb3e6427cbbc0 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: ffffffff83b553c8 R12: ffff9ab702b6fe00
kernel: R13: 0000000000000001 R14: ffff9ab701080318 R15: ffff9ab702b6fea4
kernel: FS:  00007f7bb663d900(0000) GS:ffff9ab807600000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000556c0b9d9bb4 CR3: 00000001137b4000 CR4: 0000000000350ee0
kernel: Call Trace:
kernel:  __enable_irq+0x52/0x60
kernel:  resume_irqs+0xd3/0x120
kernel:  resume_device_irqs+0x10/0x20
kernel:  dpm_resume_noirq+0x1db/0x2a0
kernel:  suspend_devices_and_enter+0x214/0x580
kernel:  pm_suspend.cold+0x332/0x37d
kernel:  state_store+0x82/0xe0
kernel:  kobj_attr_store+0x12/0x20
kernel:  sysfs_kf_write+0x3e/0x50
kernel:  kernfs_fop_write_iter+0x138/0x1c0
kernel:  new_sync_write+0x117/0x1b0
kernel:  vfs_write+0x185/0x250
kernel:  ksys_write+0x67/0xe0
kernel:  __x64_sys_write+0x1a/0x20
kernel:  do_syscall_64+0x61/0xb0
kernel:  ? syscall_exit_to_user_mode+0x27/0x50
kernel:  ? __do_sys_gettid+0x1b/0x20
kernel:  ? do_syscall_64+0x6e/0xb0
kernel:  ? exit_to_user_mode_prepare+0x3d/0x1c0
kernel:  ? do_user_addr_fault+0x1d0/0x640
kernel:  ? irqentry_exit_to_user_mode+0x9/0x20
kernel:  ? irqentry_exit+0x19/0x30
kernel:  ? exc_page_fault+0x8f/0x170
kernel:  ? asm_exc_page_fault+0x8/0x30
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
kernel: RIP: 0033:0x7f7bb6ed7c47
kernel: Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <4>
kernel: RSP: 002b:00007ffdfa4abba8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
kernel: RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f7bb6ed7c47
kernel: RDX: 0000000000000004 RSI: 00007ffdfa4abc60 RDI: 0000000000000004
kernel: RBP: 00007ffdfa4abc60 R08: 0000000000000004 R09: 0000000000000000
kernel: R10: 00007f7bb6f75040 R11: 0000000000000246 R12: 0000000000000004
kernel: R13: 000055bf4826f2d0 R14: 0000000000000004 R15: 00007f7bb6fb18a0
kernel: ---[ end trace 41bcfee1ceee2330 ]---
kernel: nvme nvme0: I/O 192 QID 4 timeout, aborting
kernel: nvme nvme0: I/O 193 QID 4 timeout, aborting
kernel: nvme nvme0: I/O 194 QID 4 timeout, aborting
kernel: nvme nvme0: I/O 195 QID 4 timeout, aborting
kernel: nvme nvme0: I/O 196 QID 4 timeout, aborting
kernel: nvme nvme0: I/O 197 QID 4 timeout, aborting
kernel: nvme nvme0: I/O 198 QID 4 timeout, aborting
kernel: nvme nvme0: I/O 199 QID 4 timeout, aborting
kernel: nvme nvme0: I/O 16 QID 0 timeout, reset controller
kernel: nvme nvme0: Abort status: 0x371
kernel: nvme nvme0: Abort status: 0x371
kernel: nvme nvme0: Abort status: 0x371
kernel: nvme nvme0: Abort status: 0x371
kernel: nvme nvme0: Abort status: 0x371
kernel: nvme nvme0: Abort status: 0x371
kernel: nvme nvme0: Abort status: 0x371
kernel: nvme nvme0: Abort status: 0x371
kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -16
kernel: nvme 0000:04:00.0: PM: failed to resume async: error -16

@jackpot51 jackpot51 force-pushed the master branch 2 times, most recently from a887ad7 to e862cf5 Compare August 26, 2021 15:31
superm1 and others added 18 commits September 9, 2021 09:39
Although first implemented for NVME, this check may be usable by
other drivers as well. Microsoft's specification explicitly mentions
that is may be usable by SATA and AHCI devices.  Google also indicates
that they have used this with SDHCI in a downstream kernel tree that
a user can plug a storage device into.

Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro
Suggested-by: Keith Busch <[email protected]>
CC: Shyam-sundar S-k <[email protected]>
CC: Alexander Deucher <[email protected]>
CC: Rafael J. Wysocki <[email protected]>
CC: Prike Liang <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Reviewed-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
AMD systems from Renoir and Lucienne require that the NVME controller
is put into D3 over a Modern Standby / suspend-to-idle
cycle.  This is "typically" accomplished using the `StorageD3Enable`
property in the _DSD, but this property was introduced after many
of these systems launched and most OEM systems don't have it in
their BIOS.

On AMD Renoir without these drives going into D3 over suspend-to-idle
the resume will fail with the NVME controller being reset and a trace
like this in the kernel logs:
```
[   83.556118] nvme nvme0: I/O 161 QID 2 timeout, aborting
[   83.556178] nvme nvme0: I/O 162 QID 2 timeout, aborting
[   83.556187] nvme nvme0: I/O 163 QID 2 timeout, aborting
[   83.556196] nvme nvme0: I/O 164 QID 2 timeout, aborting
[   95.332114] nvme nvme0: I/O 25 QID 0 timeout, reset controller
[   95.332843] nvme nvme0: Abort status: 0x371
[   95.332852] nvme nvme0: Abort status: 0x371
[   95.332856] nvme nvme0: Abort status: 0x371
[   95.332859] nvme nvme0: Abort status: 0x371
[   95.332909] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -16
[   95.332936] nvme 0000:03:00.0: PM: failed to resume async: error -16
```

The Microsoft documentation for StorageD3Enable mentioned that Windows has
a hardcoded allowlist for D3 support, which was used for these platforms.
Introduce quirks to hardcode them for Linux as well.

As this property is now "standardized", OEM systems using AMD Cezanne and
newer APU's have adopted this property, and quirks like this should not be
necessary.

CC: Shyam-sundar S-k <[email protected]>
CC: Alexander Deucher <[email protected]>
CC: Prike Liang <[email protected]>
Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro
Signed-off-by: Mario Limonciello <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Tested-by: Julian Sikorski <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
AMD spec mentions only revision 0. With this change,
device constraint list is populated properly.

Signed-off-by: Pratik Vishwakarma <[email protected]>
Tested-by: Julian Sikorski <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Refactor common code to prepare for upcoming changes.
 * Remove unused struct.
 * Print error before returning.
 * Frees ACPI obj if _DSM type is not as expected.
 * Treat lps0_dsm_func_mask as an integer rather than character
 * Remove extra out_obj
 * Move rev_id

Co-developed-by: Mario Limonciello <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Pratik Vishwakarma <[email protected]>
Tested-by: Julian Sikorski <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Required for follow-up patch adding new UUID needing new function
mask.

Signed-off-by: Pratik Vishwakarma <[email protected]>
Tested-by: Julian Sikorski <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
This adds supports for _DSM notifications to the Microsoft UUID
described by Microsoft documentation for s2idle.

Link: https://docs.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-firmware-notifications
Co-developed-by: Mario Limonciello <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Pratik Vishwakarma <[email protected]>
Tested-by: Julian Sikorski <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Some AMD Systems with uPEP _HID AMD004/AMDI005 have an off by one bug
in their function mask return.  This means that they will call entrance
but not exit for matching functions.

Other AMD systems with this HID should use the Microsoft generic UUID.

AMD systems with uPEP HID AMDI006 should be using the Microsoft method.

Signed-off-by: Mario Limonciello <[email protected]>
Tested-by: Julian Sikorski <[email protected]>
Tested-by: Kai-Heng Feng <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
When using s2idle on a variety of AMD notebook systems, they are
experiencing spurious events that the EC or SMU are in the wrong
state leading to a hard time waking up or higher than expected
power consumption.

These events only occur when the EC GPE is inadvertently set as a wakeup
source. Originally the EC GPE was only set as a wakeup source when using
the intel-vbtn or intel-hid drivers in commit 10a08fd ("ACPI: PM:
Set up EC GPE for system wakeup from drivers that need it") but during
testing a reporter discovered that this was not enough for their ASUS
Zenbook UX430UNR/i7-8550U to wakeup by lid event or keypress.
Marking the EC GPE for wakeup universally resolved this for that
reporter in commit b90ff35 ("ACPI: PM: s2idle: Always set up EC GPE
for system wakeup").

However this behavior has lead to a number of problems:

 * On both Lenovo T14 and P14s the keyboard wakeup doesn't work, and
   sometimes the power button event doesn't work.
 * On HP 635 G7 detaching or attaching AC during suspend will cause
   the system not to wakeup
 * On Asus vivobook to prevent detaching AC causing resume problems
 * On Lenovo 14ARE05 to prevent detaching AC causing resume problems
 * On HP ENVY x360  to prevent detaching AC causing resume problems

As there may be other Intel systems besides ASUS Zenbook UX430UNR/i7-8550U
that don't use intel-vbtn or intel-hid, avoid these problems by only
universally marking the EC GPE wakesource on non-AMD systems.

Link: https://patchwork.kernel.org/project/linux-pm/cover/5997740.FPbUVk04hV@kreacher/#22825489
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1230
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1629
Signed-off-by: Mario Limonciello <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
The protocol to submit a job request to SMU is to wait for
AMD_PMC_REGISTER_RESPONSE to return 1,meaning SMU is ready to take
requests. PMC driver has to make sure that the response code is always
AMD_PMC_RESULT_OK before making any command submissions.

When we submit a message to SMU, we have to wait until it processes
the request. Adding a read_poll_timeout() check as this was missing in
the existing code.

Also, add a mutex to protect amd_pmc_send_cmd() calls to SMU.

Fixes: 156ec47 ("platform/x86: amd-pmc: Add AMD platform support for S2Idle")
Signed-off-by: Shyam Sundar S K <[email protected]>
Acked-by: Raul E Rangel <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
It was lately understood that the current mechanism available in the
driver to get SMU firmware info works only on internal SMU builds and
there is a separate way to get all the SMU logging counters (addressed
in the next patch). Hence remove all the smu info shown via debugfs as it
is no more useful.

Fixes: 156ec47 ("platform/x86: amd-pmc: Add AMD platform support for S2Idle")
Signed-off-by: Shyam Sundar S K <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
Currently amd_pmc_dump_registers() routine is being called at
multiple places. The best to call it is after command submission
to SMU.

Signed-off-by: Shyam Sundar S K <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
SMU provides a way to dump the s0ix debug statistics in the form of a
metrics table via a of set special mailbox commands.

Add support to the driver which can send these commands to SMU and expose
the information received via debugfs. The information contains the s0ix
entry/exit, active time of each IP block etc.

As a side note, SMU subsystem logging is not supported on Picasso based
SoC's.

Signed-off-by: Shyam Sundar S K <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
Even the FCH SSC registers provides certain level of information
about the s0ix entry and exit times which comes handy when the SMU
fails to report the statistics via the mailbox communication.

This information is captured via a new debugfs file "s0ix_stats".
A non-zero entry in this counters would mean that the system entered
the s0ix state.

If s0ix entry time and exit time don't change during suspend to idle,
the silicon has not entered the deepest state.

Signed-off-by: Shyam Sundar S K <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
Some newer BIOSes have added another ACPI ID for the uPEP device.
SMU statistics behave identically on this device.

Signed-off-by: Shyam Sundar S K <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
The upcoming PMC controller would have a newer acpi id, add that to
the supported acpid device list.

Signed-off-by: Shyam Sundar S K <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
Right now the driver will still return success even if the OS_HINT
command failed to send to the SMU. In the rare event of a failure,
the suspend should really be aborted here so that relevant logs
can may be captured.

Signed-off-by: Mario Limonciello <[email protected]>
Acked-by: Shyam Sundar S K <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
It was reported by a user with a Dell m15 R5 (5800H) that
the keyboard backlight was turning on when entering suspend
and turning off when exiting (the opposite of how it should be).

The user bisected it back to commit 5dbf509 ("ACPI: PM:
s2idle: Add support for new Microsoft UUID").  Previous to that
commit the LEDs didn't turn off at all.  Confirming in the spec,
these were reversed when introduced.

Fix them to match the spec.

BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1230#note_1021836
Fixes: 5dbf509 ("ACPI: PM: s2idle: Add support for new Microsoft UUID")
Signed-off-by: Mario Limonciello <[email protected]>
…orted

It was reported that on "HP ENVY x360" that power LED does not come
back, certain keys like brightness controls do not work, and the fan
never spins up, even under load on 5.14 final.

In analysis of the SSDT it's clear that the Microsoft UUID doesn't
provide functional support, but rather the AMD UUID should be
supporting this system.

Because this is a gap in the expected logic, we checked back with
internal team.  The conclusion was that on Windows AMD uPEP *does*
run even when Microsoft UUID present, but most OEM systems have
adopted value of "0x3" for supported functions and hence nothing
runs.

Henceforth add support for running both Microsoft and AMD methods.
This approach will also allow the same logic on Intel systems if
desired at a future time as well by pulling the evaluation of
`lps0_dsm_func_mask_microsoft` out of the `if` block for
`acpi_s2idle_vendor_amd`.

Link: https://gitlab.freedesktop.org/drm/amd/uploads/9fbcd7ec3a385cc6949c9bacf45dc41b/acpi-f.20.bin
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1691
Reported-by: Maxwell Beck <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
[ rjw: Edits of the new comments ]
Signed-off-by: Rafael J. Wysocki <[email protected]>
@jackpot51 jackpot51 closed this Sep 14, 2021
@jackpot51 jackpot51 deleted the amdgpu-s0ix branch September 14, 2021 18:13
jackpot51 pushed a commit that referenced this pull request Feb 7, 2022
[ Upstream commit 04d8066 ]

Currently, with an unknown recv_type, mwifiex_usb_recv
just return -1 without restoring the skb. Next time
mwifiex_usb_rx_complete is invoked with the same skb,
calling skb_put causes skb_over_panic.

The bug is triggerable with a compromised/malfunctioning
usb device. After applying the patch, skb_over_panic
no longer shows up with the same input.

Attached is the panic report from fuzzing.
skbuff: skb_over_panic: text:000000003bf1b5fa
 len:2048 put:4 head:00000000dd6a115b data:000000000a9445d8
 tail:0x844 end:0x840 dev:<NULL>
kernel BUG at net/core/skbuff.c:109!
invalid opcode: 0000 [#1] SMP KASAN NOPTI
CPU: 0 PID: 198 Comm: in:imklog Not tainted 5.6.0 #60
RIP: 0010:skb_panic+0x15f/0x161
Call Trace:
 <IRQ>
 ? mwifiex_usb_rx_complete+0x26b/0xfcd [mwifiex_usb]
 skb_put.cold+0x24/0x24
 mwifiex_usb_rx_complete+0x26b/0xfcd [mwifiex_usb]
 __usb_hcd_giveback_urb+0x1e4/0x380
 usb_giveback_urb_bh+0x241/0x4f0
 ? __hrtimer_run_queues+0x316/0x740
 ? __usb_hcd_giveback_urb+0x380/0x380
 tasklet_action_common.isra.0+0x135/0x330
 __do_softirq+0x18c/0x634
 irq_exit+0x114/0x140
 smp_apic_timer_interrupt+0xde/0x380
 apic_timer_interrupt+0xf/0x20
 </IRQ>

Reported-by: Brendan Dolan-Gavitt <[email protected]>
Signed-off-by: Zekun Shen <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
jackpot51 pushed a commit that referenced this pull request Feb 7, 2022
commit 8b59b0a upstream.

arm32 uses software to simulate the instruction replaced
by kprobe. some instructions may be simulated by constructing
assembly functions. therefore, before executing instruction
simulation, it is necessary to construct assembly function
execution environment in C language through binding registers.
after kasan is enabled, the register binding relationship will
be destroyed, resulting in instruction simulation errors and
causing kernel panic.

the kprobe emulate instruction function is distributed in three
files: actions-common.c actions-arm.c actions-thumb.c, so disable
KASAN when compiling these files.

for example, use kprobe insert on cap_capable+20 after kasan
enabled, the cap_capable assembly code is as follows:
<cap_capable>:
e92d47f0	push	{r4, r5, r6, r7, r8, r9, sl, lr}
e1a05000	mov	r5, r0
e280006c	add	r0, r0, #108    ; 0x6c
e1a04001	mov	r4, r1
e1a06002	mov	r6, r2
e59fa090	ldr	sl, [pc, #144]  ;
ebfc7bf8	bl	c03aa4b4 <__asan_load4>
e595706c	ldr	r7, [r5, #108]  ; 0x6c
e2859014	add	r9, r5, #20
......
The emulate_ldr assembly code after enabling kasan is as follows:
c06f1384 <emulate_ldr>:
e92d47f0	push	{r4, r5, r6, r7, r8, r9, sl, lr}
e282803c	add	r8, r2, #60     ; 0x3c
e1a05000	mov	r5, r0
e7e37855	ubfx	r7, r5, #16, #4
e1a00008	mov	r0, r8
e1a09001	mov	r9, r1
e1a04002	mov	r4, r2
ebf35462	bl	c03c6530 <__asan_load4>
e357000f	cmp	r7, #15
e7e36655	ubfx	r6, r5, #12, #4
e205a00f	and	sl, r5, #15
0a000001	beq	c06f13bc <emulate_ldr+0x38>
e0840107	add	r0, r4, r7, lsl #2
ebf3545c	bl	c03c6530 <__asan_load4>
e084010a	add	r0, r4, sl, lsl #2
ebf3545a	bl	c03c6530 <__asan_load4>
e2890010	add	r0, r9, #16
ebf35458	bl	c03c6530 <__asan_load4>
e5990010	ldr	r0, [r9, #16]
e12fff30	blx	r0
e356000f	cm	r6, #15
1a000014	bne	c06f1430 <emulate_ldr+0xac>
e1a06000	mov	r6, r0
e2840040	add	r0, r4, #64     ; 0x40
......

when running in emulate_ldr to simulate the ldr instruction, panic
occurred, and the log is as follows:
Unable to handle kernel NULL pointer dereference at virtual address
00000090
pgd = ecb46400
[00000090] *pgd=2e0fa003, *pmd=00000000
Internal error: Oops: 206 [#1] SMP ARM
PC is at cap_capable+0x14/0xb0
LR is at emulate_ldr+0x50/0xc0
psr: 600d0293 sp : ecd63af8  ip : 00000004  fp : c0a7c30c
r10: 00000000  r9 : c30897f4  r8 : ecd63cd4
r7 : 0000000f  r6 : 0000000a  r5 : e59fa090  r4 : ecd63c98
r3 : c06ae294  r2 : 00000000  r1 : b7611300  r0 : bf4ec008
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 32c5387d  Table: 2d546400  DAC: 55555555
Process bash (pid: 1643, stack limit = 0xecd60190)
(cap_capable) from (kprobe_handler+0x218/0x340)
(kprobe_handler) from (kprobe_trap_handler+0x24/0x48)
(kprobe_trap_handler) from (do_undefinstr+0x13c/0x364)
(do_undefinstr) from (__und_svc_finish+0x0/0x30)
(__und_svc_finish) from (cap_capable+0x18/0xb0)
(cap_capable) from (cap_vm_enough_memory+0x38/0x48)
(cap_vm_enough_memory) from
(security_vm_enough_memory_mm+0x48/0x6c)
(security_vm_enough_memory_mm) from
(copy_process.constprop.5+0x16b4/0x25c8)
(copy_process.constprop.5) from (_do_fork+0xe8/0x55c)
(_do_fork) from (SyS_clone+0x1c/0x24)
(SyS_clone) from (__sys_trace_return+0x0/0x10)
Code: 0050a0e1 6c0080e2 0140a0e1 0260a0e1 (f801f0e7)

Fixes: 35aa1df ("ARM kprobes: instruction single-stepping support")
Fixes: 4210157 ("ARM: 9017/2: Enable KASan for ARM")
Signed-off-by: huangshaobo <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jackpot51 pushed a commit that referenced this pull request Feb 28, 2022
[ Upstream commit 04d8066 ]

Currently, with an unknown recv_type, mwifiex_usb_recv
just return -1 without restoring the skb. Next time
mwifiex_usb_rx_complete is invoked with the same skb,
calling skb_put causes skb_over_panic.

The bug is triggerable with a compromised/malfunctioning
usb device. After applying the patch, skb_over_panic
no longer shows up with the same input.

Attached is the panic report from fuzzing.
skbuff: skb_over_panic: text:000000003bf1b5fa
 len:2048 put:4 head:00000000dd6a115b data:000000000a9445d8
 tail:0x844 end:0x840 dev:<NULL>
kernel BUG at net/core/skbuff.c:109!
invalid opcode: 0000 [#1] SMP KASAN NOPTI
CPU: 0 PID: 198 Comm: in:imklog Not tainted 5.6.0 #60
RIP: 0010:skb_panic+0x15f/0x161
Call Trace:
 <IRQ>
 ? mwifiex_usb_rx_complete+0x26b/0xfcd [mwifiex_usb]
 skb_put.cold+0x24/0x24
 mwifiex_usb_rx_complete+0x26b/0xfcd [mwifiex_usb]
 __usb_hcd_giveback_urb+0x1e4/0x380
 usb_giveback_urb_bh+0x241/0x4f0
 ? __hrtimer_run_queues+0x316/0x740
 ? __usb_hcd_giveback_urb+0x380/0x380
 tasklet_action_common.isra.0+0x135/0x330
 __do_softirq+0x18c/0x634
 irq_exit+0x114/0x140
 smp_apic_timer_interrupt+0xde/0x380
 apic_timer_interrupt+0xf/0x20
 </IRQ>

Reported-by: Brendan Dolan-Gavitt <[email protected]>
Signed-off-by: Zekun Shen <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
jackpot51 pushed a commit that referenced this pull request Feb 28, 2022
commit 8b59b0a upstream.

arm32 uses software to simulate the instruction replaced
by kprobe. some instructions may be simulated by constructing
assembly functions. therefore, before executing instruction
simulation, it is necessary to construct assembly function
execution environment in C language through binding registers.
after kasan is enabled, the register binding relationship will
be destroyed, resulting in instruction simulation errors and
causing kernel panic.

the kprobe emulate instruction function is distributed in three
files: actions-common.c actions-arm.c actions-thumb.c, so disable
KASAN when compiling these files.

for example, use kprobe insert on cap_capable+20 after kasan
enabled, the cap_capable assembly code is as follows:
<cap_capable>:
e92d47f0	push	{r4, r5, r6, r7, r8, r9, sl, lr}
e1a05000	mov	r5, r0
e280006c	add	r0, r0, #108    ; 0x6c
e1a04001	mov	r4, r1
e1a06002	mov	r6, r2
e59fa090	ldr	sl, [pc, #144]  ;
ebfc7bf8	bl	c03aa4b4 <__asan_load4>
e595706c	ldr	r7, [r5, #108]  ; 0x6c
e2859014	add	r9, r5, #20
......
The emulate_ldr assembly code after enabling kasan is as follows:
c06f1384 <emulate_ldr>:
e92d47f0	push	{r4, r5, r6, r7, r8, r9, sl, lr}
e282803c	add	r8, r2, #60     ; 0x3c
e1a05000	mov	r5, r0
e7e37855	ubfx	r7, r5, #16, #4
e1a00008	mov	r0, r8
e1a09001	mov	r9, r1
e1a04002	mov	r4, r2
ebf35462	bl	c03c6530 <__asan_load4>
e357000f	cmp	r7, #15
e7e36655	ubfx	r6, r5, #12, #4
e205a00f	and	sl, r5, #15
0a000001	beq	c06f13bc <emulate_ldr+0x38>
e0840107	add	r0, r4, r7, lsl #2
ebf3545c	bl	c03c6530 <__asan_load4>
e084010a	add	r0, r4, sl, lsl #2
ebf3545a	bl	c03c6530 <__asan_load4>
e2890010	add	r0, r9, #16
ebf35458	bl	c03c6530 <__asan_load4>
e5990010	ldr	r0, [r9, #16]
e12fff30	blx	r0
e356000f	cm	r6, #15
1a000014	bne	c06f1430 <emulate_ldr+0xac>
e1a06000	mov	r6, r0
e2840040	add	r0, r4, #64     ; 0x40
......

when running in emulate_ldr to simulate the ldr instruction, panic
occurred, and the log is as follows:
Unable to handle kernel NULL pointer dereference at virtual address
00000090
pgd = ecb46400
[00000090] *pgd=2e0fa003, *pmd=00000000
Internal error: Oops: 206 [#1] SMP ARM
PC is at cap_capable+0x14/0xb0
LR is at emulate_ldr+0x50/0xc0
psr: 600d0293 sp : ecd63af8  ip : 00000004  fp : c0a7c30c
r10: 00000000  r9 : c30897f4  r8 : ecd63cd4
r7 : 0000000f  r6 : 0000000a  r5 : e59fa090  r4 : ecd63c98
r3 : c06ae294  r2 : 00000000  r1 : b7611300  r0 : bf4ec008
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 32c5387d  Table: 2d546400  DAC: 55555555
Process bash (pid: 1643, stack limit = 0xecd60190)
(cap_capable) from (kprobe_handler+0x218/0x340)
(kprobe_handler) from (kprobe_trap_handler+0x24/0x48)
(kprobe_trap_handler) from (do_undefinstr+0x13c/0x364)
(do_undefinstr) from (__und_svc_finish+0x0/0x30)
(__und_svc_finish) from (cap_capable+0x18/0xb0)
(cap_capable) from (cap_vm_enough_memory+0x38/0x48)
(cap_vm_enough_memory) from
(security_vm_enough_memory_mm+0x48/0x6c)
(security_vm_enough_memory_mm) from
(copy_process.constprop.5+0x16b4/0x25c8)
(copy_process.constprop.5) from (_do_fork+0xe8/0x55c)
(_do_fork) from (SyS_clone+0x1c/0x24)
(SyS_clone) from (__sys_trace_return+0x0/0x10)
Code: 0050a0e1 6c0080e2 0140a0e1 0260a0e1 (f801f0e7)

Fixes: 35aa1df ("ARM kprobes: instruction single-stepping support")
Fixes: 4210157 ("ARM: 9017/2: Enable KASan for ARM")
Signed-off-by: huangshaobo <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants