Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Feb 24, 2020. It is now read-only.

Conversation

@squall0gd
Copy link
Contributor

This patch is fixing dead lock issue in lkvm. When VM is sending
reboot signal to lkvm then lkvm is sending SIGKVMEXIT to all cpu threads.
When first cpu became closed, pci-devices are checking which cpus are still
alive and send to them SIGKVMPAUSE signal(for detaching, pci-devices require
threads in pause or exited state). After this signal, devices are waiting
for response from threads.

Main problem in this bug is situation when cpu thread becames exited and
SIGKVMPAUSE signal was sent. In this situation Pci-device is still waiting
for notification from closed thread.

To fix this bug I am waiting for exit status from all cpus before, turning
down devices.

/* Only VCPU #0 is going to exit by itself when shutting down */
- return pthread_join(kvm->cpus[0]->thread, &ret);
+ /* */
+ /* BUT It's not actually true for reboot sequence */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a note, from reboot sequence originates (keyboard device) - for future reference

@squall0gd squall0gd force-pushed the mstachowski/rkt_lkvm_dead_lock_fix branch 2 times, most recently from b56f384 to 7a6200c Compare February 18, 2016 21:07
+ /* cpu threads */
+ for (i = 0; i < kvm->nrcpus; i++) {
+ int retval;
+ int *ret = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better set with NULL - not sure it's good idea initialize pointer with 0 - IMHO, that could make segfaults to (write to memory at adress zero).

@squall0gd squall0gd force-pushed the mstachowski/rkt_lkvm_dead_lock_fix branch from 7a6200c to ec26776 Compare February 19, 2016 20:45
+ die("unable to end KVM VCPU thread");
+
+ /* Set exit status if one of vcpus returns error code > 0 */
+ if (*ret != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does check if ret pointer is not null make sense here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is checking thread's exit code.

@squall0gd squall0gd force-pushed the mstachowski/rkt_lkvm_dead_lock_fix branch from ec26776 to ec916c2 Compare February 19, 2016 21:28
squall0gd added a commit to intelsdi-x/rkt that referenced this pull request Feb 20, 2016
Cherry pick for tests purposses. It should be merged in
pull request rkt#2191

This patch is fixing dead lock issue in lkvm. When kernel is sends
keyboard signal for reboot to lkvm, it sends SIGKVMEXIT to all cpu threads.
When first cpu became closed, devices are checking which cpus are still
alive and send to them SIGKVMPAUSE signal(for detaching, devices require
threads in pause or exited state). After this signal, devices are waiting
for response from threads.

Main problem in this bug is situation when cpu thread becames exited and
SIGKVMPAUSE signal was sent. In this situation Pci-device is still waiting
for notification from closed thread.

To fix this bug I am waiting for exit status from all cpus before, turning
down devices.
@squall0gd squall0gd force-pushed the mstachowski/rkt_lkvm_dead_lock_fix branch from ec916c2 to 5e38406 Compare February 23, 2016 09:45
squall0gd added a commit to intelsdi-x/rkt that referenced this pull request Feb 23, 2016
Cherry pick for tests purposses. It should be merged in
pull request rkt#2191

This patch is fixing dead lock issue in lkvm. When kernel is sends
keyboard signal for reboot to lkvm, it sends SIGKVMEXIT to all cpu threads.
When first cpu became closed, devices are checking which cpus are still
alive and send to them SIGKVMPAUSE signal(for detaching, devices require
threads in pause or exited state). After this signal, devices are waiting
for response from threads.

Main problem in this bug is situation when cpu thread becames exited and
SIGKVMPAUSE signal was sent. In this situation Pci-device is still waiting
for notification from closed thread.

To fix this bug I am waiting for exit status from all cpus before, turning
down devices.
@squall0gd
Copy link
Contributor Author

Review request

+ die("unable to end KVM VCPU thread");
+
+ /* Set exit status if one of vcpus returns error code > 0 */
+ if ((intptr_t) c_thrstatus != 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pthread has confusing docs about the thread return value, so I wrote a small C example to remind myself how does it work exactly:

#include <pthread.h>
#include <stdio.h>
#include <stdint.h>

void*
run(void* arg)
{
  return (void*)(intptr_t)1;
}

int
main(void)
{
  pthread_t tid;
  int retval = 0;
  void *ptr = &retval;
  pthread_create(&tid, NULL, run, NULL);
  pthread_join(tid, &ptr);
  printf("ptr: %p\nretval: %d\n", ptr, retval);
}

The result was:

ptr: 0x1
retval: 0

So, I think that:

  1. It should be (intptr_t) thrstatus in both places (in the if and in its body).
  2. The c_thrstatus variable is unnecessary and can be removed.
  3. The c_thrstatus variable is used to initialize the value of the thrstatus variable. I guess we can skip the initialization altogether (or initialize it to something else like &retval or NULL, but it's not important - the value will be overwritten by pthread_join anyway).
  4. Another problem I see here is that this patch changes semantics of the kvm_cmd_run_work function's return value - earlier it was returning an errno value (EDEADLK, EINVAL, and so on), now it returns a return value of thread's function and it is not a typical errno value, just 0 or 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we care about exit status from all joins (in this case status is overwritten in a loop), or the first that is's not 0 (in this case break loop is handy) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krnowak Thanks for your review!
1, 2 & 3 - sure, I will check that and publish new changes soon
4 - I know about that and this is my intention. If one of pthread_joins fails I'm not able to do anything with VM(VM is propably dead or its threads cannot be joined), so I want to immediately close whole lkvm process with die(). Because of that, we can expose worst exit status from cpu threads.

@squall0gd squall0gd force-pushed the mstachowski/rkt_lkvm_dead_lock_fix branch 3 times, most recently from 4aeb56e to 4b0b47b Compare February 24, 2016 18:36
squall0gd added a commit to intelsdi-x/rkt that referenced this pull request Feb 24, 2016
Cherry pick for tests purposses. It should be merged in
pull request rkt#2191

This patch is fixing dead lock issue in lkvm. When kernel is sends
keyboard signal for reboot to lkvm, it sends SIGKVMEXIT to all cpu threads.
When first cpu became closed, devices are checking which cpus are still
alive and send to them SIGKVMPAUSE signal(for detaching, devices require
threads in pause or exited state). After this signal, devices are waiting
for response from threads.

Main problem in this bug is situation when cpu thread becames exited and
SIGKVMPAUSE signal was sent. In this situation Pci-device is still waiting
for notification from closed thread.

To fix this bug I am waiting for exit status from all cpus before, turning
down devices.
squall0gd added a commit to intelsdi-x/rkt that referenced this pull request Feb 25, 2016
Cherry pick for tests purposses. It should be merged in
pull request rkt#2191

This patch is fixing dead lock issue in lkvm. When kernel is sends
keyboard signal for reboot to lkvm, it sends SIGKVMEXIT to all cpu threads.
When first cpu became closed, devices are checking which cpus are still
alive and send to them SIGKVMPAUSE signal(for detaching, devices require
threads in pause or exited state). After this signal, devices are waiting
for response from threads.

Main problem in this bug is situation when cpu thread becames exited and
SIGKVMPAUSE signal was sent. In this situation Pci-device is still waiting
for notification from closed thread.

To fix this bug I am waiting for exit status from all cpus before, turning
down devices.
squall0gd added a commit to intelsdi-x/rkt that referenced this pull request Feb 25, 2016
Cherry pick for tests purposses. It should be merged in
pull request rkt#2191

This patch is fixing dead lock issue in lkvm. When kernel is sends
keyboard signal for reboot to lkvm, it sends SIGKVMEXIT to all cpu threads.
When first cpu became closed, devices are checking which cpus are still
alive and send to them SIGKVMPAUSE signal(for detaching, devices require
threads in pause or exited state). After this signal, devices are waiting
for response from threads.

Main problem in this bug is situation when cpu thread becames exited and
SIGKVMPAUSE signal was sent. In this situation Pci-device is still waiting
for notification from closed thread.

To fix this bug I am waiting for exit status from all cpus before, turning
down devices.
@squall0gd
Copy link
Contributor Author

Review request

@squall0gd squall0gd force-pushed the mstachowski/rkt_lkvm_dead_lock_fix branch from 4b0b47b to 718ff89 Compare February 25, 2016 13:17
This patch is fixing dead lock issue in lkvm. When VM is sending
reboot signal to lkvm then lkvm is sending SIGKVMEXIT to all cpu threads.
When first cpu became closed, pci-devices are checking which cpus are still
alive and send to them SIGKVMPAUSE signal(for detaching, pci-devices require
threads in pause or exited  state). After this signal, devices are waiting
for response from threads.

Main problem in this bug is situation when cpu thread becames exited and
SIGKVMPAUSE signal was sent. In this situation Pci-device is still waiting
for notification from closed thread.

To fix this bug I am waiting for exit status from all cpus before, turning
down devices
squall0gd added a commit to intelsdi-x/rkt that referenced this pull request Feb 26, 2016
Cherry pick for tests purposses. It should be merged in
pull request rkt#2191

This patch is fixing dead lock issue in lkvm. When kernel is sends
keyboard signal for reboot to lkvm, it sends SIGKVMEXIT to all cpu threads.
When first cpu became closed, devices are checking which cpus are still
alive and send to them SIGKVMPAUSE signal(for detaching, devices require
threads in pause or exited state). After this signal, devices are waiting
for response from threads.

Main problem in this bug is situation when cpu thread becames exited and
SIGKVMPAUSE signal was sent. In this situation Pci-device is still waiting
for notification from closed thread.

To fix this bug I am waiting for exit status from all cpus before, turning
down devices.
@squall0gd
Copy link
Contributor Author

Review request

@krnowak
Copy link
Collaborator

krnowak commented Mar 1, 2016

LFAD.

krnowak added a commit that referenced this pull request Mar 1, 2016
…ck_fix

kvm: Fix for dead lock while rebooting VM
@krnowak krnowak merged commit f2ab94e into rkt:master Mar 1, 2016
@squall0gd squall0gd deleted the mstachowski/rkt_lkvm_dead_lock_fix branch April 11, 2016 08:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants