-
Couldn't load subscription status.
- Fork 881
kvm: Fix for dead lock while rebooting VM #2191
kvm: Fix for dead lock while rebooting VM #2191
Conversation
d628768 to
8f82941
Compare
| /* Only VCPU #0 is going to exit by itself when shutting down */ | ||
| - return pthread_join(kvm->cpus[0]->thread, &ret); | ||
| + /* */ | ||
| + /* BUT It's not actually true for reboot sequence */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a note, from reboot sequence originates (keyboard device) - for future reference
b56f384 to
7a6200c
Compare
| + /* cpu threads */ | ||
| + for (i = 0; i < kvm->nrcpus; i++) { | ||
| + int retval; | ||
| + int *ret = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better set with NULL - not sure it's good idea initialize pointer with 0 - IMHO, that could make segfaults to (write to memory at adress zero).
7a6200c to
ec26776
Compare
| + die("unable to end KVM VCPU thread"); | ||
| + | ||
| + /* Set exit status if one of vcpus returns error code > 0 */ | ||
| + if (*ret != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does check if ret pointer is not null make sense here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is checking thread's exit code.
ec26776 to
ec916c2
Compare
Cherry pick for tests purposses. It should be merged in pull request rkt#2191 This patch is fixing dead lock issue in lkvm. When kernel is sends keyboard signal for reboot to lkvm, it sends SIGKVMEXIT to all cpu threads. When first cpu became closed, devices are checking which cpus are still alive and send to them SIGKVMPAUSE signal(for detaching, devices require threads in pause or exited state). After this signal, devices are waiting for response from threads. Main problem in this bug is situation when cpu thread becames exited and SIGKVMPAUSE signal was sent. In this situation Pci-device is still waiting for notification from closed thread. To fix this bug I am waiting for exit status from all cpus before, turning down devices.
ec916c2 to
5e38406
Compare
Cherry pick for tests purposses. It should be merged in pull request rkt#2191 This patch is fixing dead lock issue in lkvm. When kernel is sends keyboard signal for reboot to lkvm, it sends SIGKVMEXIT to all cpu threads. When first cpu became closed, devices are checking which cpus are still alive and send to them SIGKVMPAUSE signal(for detaching, devices require threads in pause or exited state). After this signal, devices are waiting for response from threads. Main problem in this bug is situation when cpu thread becames exited and SIGKVMPAUSE signal was sent. In this situation Pci-device is still waiting for notification from closed thread. To fix this bug I am waiting for exit status from all cpus before, turning down devices.
|
Review request |
| + die("unable to end KVM VCPU thread"); | ||
| + | ||
| + /* Set exit status if one of vcpus returns error code > 0 */ | ||
| + if ((intptr_t) c_thrstatus != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pthread has confusing docs about the thread return value, so I wrote a small C example to remind myself how does it work exactly:
#include <pthread.h>
#include <stdio.h>
#include <stdint.h>
void*
run(void* arg)
{
return (void*)(intptr_t)1;
}
int
main(void)
{
pthread_t tid;
int retval = 0;
void *ptr = &retval;
pthread_create(&tid, NULL, run, NULL);
pthread_join(tid, &ptr);
printf("ptr: %p\nretval: %d\n", ptr, retval);
}The result was:
ptr: 0x1
retval: 0
So, I think that:
- It should be
(intptr_t) thrstatusin both places (in theifand in its body). - The
c_thrstatusvariable is unnecessary and can be removed. - The
c_thrstatusvariable is used to initialize the value of thethrstatusvariable. I guess we can skip the initialization altogether (or initialize it to something else like&retvalorNULL, but it's not important - the value will be overwritten bypthread_joinanyway). - Another problem I see here is that this patch changes semantics of the
kvm_cmd_run_workfunction's return value - earlier it was returning anerrnovalue (EDEADLK,EINVAL, and so on), now it returns a return value of thread's function and it is not a typicalerrnovalue, just0or1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we care about exit status from all joins (in this case status is overwritten in a loop), or the first that is's not 0 (in this case break loop is handy) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krnowak Thanks for your review!
1, 2 & 3 - sure, I will check that and publish new changes soon
4 - I know about that and this is my intention. If one of pthread_joins fails I'm not able to do anything with VM(VM is propably dead or its threads cannot be joined), so I want to immediately close whole lkvm process with die(). Because of that, we can expose worst exit status from cpu threads.
4aeb56e to
4b0b47b
Compare
Cherry pick for tests purposses. It should be merged in pull request rkt#2191 This patch is fixing dead lock issue in lkvm. When kernel is sends keyboard signal for reboot to lkvm, it sends SIGKVMEXIT to all cpu threads. When first cpu became closed, devices are checking which cpus are still alive and send to them SIGKVMPAUSE signal(for detaching, devices require threads in pause or exited state). After this signal, devices are waiting for response from threads. Main problem in this bug is situation when cpu thread becames exited and SIGKVMPAUSE signal was sent. In this situation Pci-device is still waiting for notification from closed thread. To fix this bug I am waiting for exit status from all cpus before, turning down devices.
Cherry pick for tests purposses. It should be merged in pull request rkt#2191 This patch is fixing dead lock issue in lkvm. When kernel is sends keyboard signal for reboot to lkvm, it sends SIGKVMEXIT to all cpu threads. When first cpu became closed, devices are checking which cpus are still alive and send to them SIGKVMPAUSE signal(for detaching, devices require threads in pause or exited state). After this signal, devices are waiting for response from threads. Main problem in this bug is situation when cpu thread becames exited and SIGKVMPAUSE signal was sent. In this situation Pci-device is still waiting for notification from closed thread. To fix this bug I am waiting for exit status from all cpus before, turning down devices.
Cherry pick for tests purposses. It should be merged in pull request rkt#2191 This patch is fixing dead lock issue in lkvm. When kernel is sends keyboard signal for reboot to lkvm, it sends SIGKVMEXIT to all cpu threads. When first cpu became closed, devices are checking which cpus are still alive and send to them SIGKVMPAUSE signal(for detaching, devices require threads in pause or exited state). After this signal, devices are waiting for response from threads. Main problem in this bug is situation when cpu thread becames exited and SIGKVMPAUSE signal was sent. In this situation Pci-device is still waiting for notification from closed thread. To fix this bug I am waiting for exit status from all cpus before, turning down devices.
|
Review request |
4b0b47b to
718ff89
Compare
This patch is fixing dead lock issue in lkvm. When VM is sending reboot signal to lkvm then lkvm is sending SIGKVMEXIT to all cpu threads. When first cpu became closed, pci-devices are checking which cpus are still alive and send to them SIGKVMPAUSE signal(for detaching, pci-devices require threads in pause or exited state). After this signal, devices are waiting for response from threads. Main problem in this bug is situation when cpu thread becames exited and SIGKVMPAUSE signal was sent. In this situation Pci-device is still waiting for notification from closed thread. To fix this bug I am waiting for exit status from all cpus before, turning down devices
Cherry pick for tests purposses. It should be merged in pull request rkt#2191 This patch is fixing dead lock issue in lkvm. When kernel is sends keyboard signal for reboot to lkvm, it sends SIGKVMEXIT to all cpu threads. When first cpu became closed, devices are checking which cpus are still alive and send to them SIGKVMPAUSE signal(for detaching, devices require threads in pause or exited state). After this signal, devices are waiting for response from threads. Main problem in this bug is situation when cpu thread becames exited and SIGKVMPAUSE signal was sent. In this situation Pci-device is still waiting for notification from closed thread. To fix this bug I am waiting for exit status from all cpus before, turning down devices.
|
Review request |
|
LFAD. |
…ck_fix kvm: Fix for dead lock while rebooting VM
This patch is fixing dead lock issue in lkvm. When VM is sending
reboot signal to lkvm then lkvm is sending SIGKVMEXIT to all cpu threads.
When first cpu became closed, pci-devices are checking which cpus are still
alive and send to them SIGKVMPAUSE signal(for detaching, pci-devices require
threads in pause or exited state). After this signal, devices are waiting
for response from threads.
Main problem in this bug is situation when cpu thread becames exited and
SIGKVMPAUSE signal was sent. In this situation Pci-device is still waiting
for notification from closed thread.
To fix this bug I am waiting for exit status from all cpus before, turning
down devices.