Thanks to visit codestin.com
Credit goes to github.com

Skip to content

fix std::system_error in TimeoutNode #549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

alsora
Copy link
Contributor

@alsora alsora commented Apr 20, 2023

This PR fixes a strange crash I have with the TimeoutNode.
I'm using v4.0.1

I haven't been able to create a minimal reproducible example yet, but it happens 100% of the times in my application.

The scenario is the following:

  • There's a BT::Tree with a TimeoutNode and an asynchronous action inside it
  • I start ticking the behavior tree (e.g. in a while loop with some delay between ticks)
  • I "cancel" the execution of the behavior tree, while it was still returning RUNNING (e.g. I exit from the loop), and then I call halt and destroy the tree
while (!cancelled && status == RUNNING) {
   status = tree->tickOnce();
   sleep; 
}
tree->haltTree();
tree.reset();

P.S. note that I;m not using tree->sleep(), but this shouldn't affect the problem.

This results in the following error

terminate called after throwing an instance of 'std::system_error'
  what():  Invalid argument
Aborted (core dumped)
--Type <RET> for more, q to quit, c to continue without paging--

Thread 73 "test" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff49ffb700 (LWP 580947)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) 
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fffd828a859 in __GI_abort () at abort.c:79
#2  0x00007fffd8664911 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007fffd867038c in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007fffd86703f7 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007fffd86706a9 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fffd866773f in std::__throw_system_error(int) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x000055555668d7c2 in std::mutex::lock (this=<optimised out>) at /usr/include/c++/8/bits/std_mutex.h:107
#8  std::unique_lock<std::mutex>::lock (this=0x7fff49fede80, this=0x7fff49fede80) at /usr/include/c++/8/bits/std_mutex.h:267
#9  std::unique_lock<std::mutex>::unique_lock (__m=..., this=0x7fff49fede80) at /usr/include/c++/8/bits/std_mutex.h:197
#10 BT::TimeoutNode<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::tick()::{lambda(bool)#1}::operator()(bool) const (aborted=true, 
    this=<optimised out>) at my-test/timeout-node.h:85
#11 std::_Function_handler<void (bool), BT::TimeoutNode<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::tick()::{lambda(bool)#1}>::_M_invoke(std::_Any_data const&, bool&&) (__functor=..., __args#0=<optimised out>) at /usr/include/c++/8/bits/std_function.h:297
#12 0x0000555556699439 in std::function<void (bool)>::operator()(bool) const (__args#0=<optimised out>, this=0x7fff49fedf10) at /usr/include/c++/8/bits/std_function.h:682
#13 BT::TimerQueue<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::checkWork (this=0x7fff780064d0)
    at my-test/timer_queue.h:233
#14 BT::TimerQueue<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::run (this=0x7fff780064d0)
    at my-test/timer_queue.h:193
#15 BT::TimerQueue<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::TimerQueue()::{lambda()#1}::operator()() const (
    this=<optimised out>) at my-test/timer_queue.h:72
#16 std::__invoke_impl<void, BT::TimerQueue<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::TimerQueue()::{lambda()#1}>(std::__invoke_other, BT::TimerQueue<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::TimerQueue()::{lambda()#1}&&) (__f=...)
    at /usr/include/c++/8/bits/invoke.h:60
#17 std::__invoke<BT::TimerQueue<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::TimerQueue()::{lambda()#1}>(std::__invoke_result&&, (BT::TimerQueue<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::TimerQueue()::{lambda()#1}&&)...) (__fn=...)
    at /usr/include/c++/8/bits/invoke.h:95
#18 std::thread::_Invoker<std::tuple<BT::TimerQueue<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::TimerQueue()::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=<optimised out>) at /usr/include/c++/8/thread:244
#19 std::thread::_Invoker<std::tuple<BT::TimerQueue<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::TimerQueue()::{lambda()#1}> >::operator()() (this=<optimised out>) at /usr/include/c++/8/thread:253
#20 std::thread::_State_impl<std::thread::_Invoker<std::tuple<BT::TimerQueue<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::TimerQueue()::{lambda()#1}> > >::_M_run() (this=<optimised out>) at /usr/include/c++/8/thread:196
#21 0x00007fffd869cde4 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#22 0x00007fffd87ba609 in start_thread (arg=<optimised out>) at pthread_create.c:477
#23 0x00007fffd8387133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The "thread" mentioned in the backtrace is this https://github.com/BehaviorTree/BehaviorTree.CPP/blob/master/include/behaviortree_cpp/decorators/timer_queue.h#L71

The exception happens while trying to lock the mutex inside that lambda function.

I'm not 100% sure why we get an exception, because the order of declaration of the members in the TimeoutNode class should guarantee that the TimerQueue is destroyed before the mutex.

Anyhow, this PR fixes the issue for me and it seems with no side effects

Note that the error doesn't happen if I tick the tree until it succeeds (in that case it looks like the TimeoutNode lambda handler is called during haltTree and it is not called during the destruction of the TimerQueue.

@facontidavide
Copy link
Collaborator

The problem I see is that this is changing completely the meening of "aborted", because before, when it was true, we would call:

child_halted_ = true;
haltChild();
emitWakeUpSignal();

But we don't anymore. I need to think a better way to do thi.

If I had a reproducible example, that would be great.

@facontidavide facontidavide self-assigned this Apr 28, 2023
@facontidavide facontidavide self-requested a review April 28, 2023 08:27
@alsora
Copy link
Contributor Author

alsora commented Apr 28, 2023

It looks to me that when "aborted" was true we were not calling those functions.
This is the current code:

        timer_id_ = timer_.add(std::chrono::milliseconds(msec_), [this](bool aborted) {
          std::unique_lock<std::mutex> lk(timeout_mutex_);
          if (!aborted && child()->status() == NodeStatus::RUNNING)
          {
            child_halted_ = true;
            haltChild();
            emitWakeUpSignal();
          }
        });

Adding the if (aborted) {return;} should be functionally equivalent.
The only difference being that with my PR we don't even lock the mutex if we are not going to need it.

@facontidavide facontidavide merged commit 801a2e7 into BehaviorTree:master Apr 28, 2023
@facontidavide
Copy link
Collaborator

Sorry, you are right. Thanks a lo

image

@alsora alsora deleted the asoragna/fix-timeout-crash branch April 28, 2023 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants