-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Hi, I found issues while running session with a multiple groups, when connections is down for the most preferable group,
the second preferable group started and connected. When the most preferable group connection restored, the less preferable
group session is closed and the system continue working with the most preferable group. I found that in some cases,
when the connection is down, and restored, Or the whole session is terminated by running "rtr_mgr_stop()" we got Mutex deadlock.
I create a session with the follow 8 groups:
1. existing local server address: 10.1.68.191, port: 8282, preference: 1
2. existing local server address: 10.1.68.134, port: 8282, preference: 2
3. dummy server address: 1.1.1.1, port 1111, preference: 3
4. dummy server address: 2.2.2.2, port 2222, preference: 4
5. dummy server address: 3.3.3.3, port 3333, preference: 5
6. dummy server address: 4.4.4.4, port 4444, preference: 6
7. dummy server address: 5.5.5.5, port 5555, preference: 7
8. dummy server address: 6.6.6.6, port 6666, preference: 8
The reason that I add more dummy servers is to cause the issue appear more often.
I added more prints in the code for Mutex-Lock, Mutex-unlock operations.
In the Log file that include, we can see the scenario that cause the system to be in mutex deadlock, In this case,
Server 1 and 2 was up and then both server 1 and 2 was down and restored, I added the follow comments to the log file (mutex-deadlock-issue-remarks.log)
-
Thread 1 - Group 1 Connection establish, Close all the other Groups
-
Thread 1 - Close Group 2 Start,
-
Thread 7 - Group 7 Attempt to connect and failed, Run: "is_some_rtr_mgr_group_established" and Lock the Mutex.
Thread 7 now is sleep until the Mutex is unlock. -
Thread 8 - Group 8 Attempt to connect and failed same as group 7, Run: "is_some_rtr_mgr_group_established" and Lock the
Mutex. Thread 8 now is sleep until the Mutex is unlock. -
Thread 1 - Continue closing group 2
-
Thread 1 - Group 2 is successfully closed, Closing Group 3.
-
Thread 1 - Group 3 is successfully closed, Closing Group 4.
-
Thread 1 - Group 4 is successfully closed, Closing Group 5.
-
Thread 1 - Group 5 is successfully closed, Closing Group 6.
-
Thread 1 - Group 6 is successfully closed, Closing Group 7.
-
Thread 1 - Running "pthread_cancel" and "pthread_join" for thread 7 (group 7),
As it normal operation, thread 7 main state machine should reach the: RTR_SHUTDOWN state but thread 7 is sleep
because the Mutex locked by thread 1, So thread 1 waiting on pthread_join, while thread 7 is sleep because the Mutex. -
The system is in deadlock.
I attached the follow files:
- rtr_static.c - Our example with all the server above.
- rtr_mgr.c - Modified file with the Mutex Lock/Unlock Debug Prints.
- mutex-deadlock-issue.log - original log.
- mutex-deadlock-issue-remarks.log - Log with the above remarks, (starting with symble: >>+)
Can you verify this issue,
Regards, Yakov S.
mutex-deadlock-issue.log
mutex-deadlock-issue-remarks.log
rtr_mgr.c.txt
rtr_static.c.txt