Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Aug 2, 2023. It is now read-only.

Conversation

@adrysn
Copy link
Member

@adrysn adrysn commented Mar 6, 2021

In some cases, sub containers are still RUNNING even though main container is TERMINATED, which we can call broken session. In this case, the scheduler for the scaling group which contains the broken session is unable to schedule PENDING sessions at all. To circumvent this edge case, we exclude broken sessions, that do not have main container, from fetching existing session list.

@adrysn adrysn self-assigned this Mar 6, 2021
@codecov
Copy link

codecov bot commented Mar 6, 2021

Codecov Report

Merging #403 (0f1e98c) into main (b0303c9) will decrease coverage by 0.01%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #403      +/-   ##
==========================================
- Coverage   46.09%   46.07%   -0.02%     
==========================================
  Files          49       49              
  Lines        7684     7687       +3     
==========================================
  Hits         3542     3542              
- Misses       4142     4145       +3     
Impacted Files Coverage Δ
src/ai/backend/manager/scheduler/dispatcher.py 19.48% <0.00%> (-0.17%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b0303c9...0f1e98c. Read the comment docs.

@adrysn adrysn added this to the 20.09 milestone Mar 7, 2021
@adrysn adrysn added the bug label Mar 7, 2021
@adrysn adrysn requested a review from achimnol March 7, 2021 12:33
Copy link
Member

@achimnol achimnol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@achimnol achimnol merged commit 41b5c38 into main Mar 8, 2021
@achimnol achimnol deleted the fix/prevent-sgroup-scheduler-fail-when-only-sub-containers-are-running branch March 8, 2021 01:52
achimnol pushed a commit that referenced this pull request Mar 8, 2021
…ainer is TERMINATED (#403)

Backported-From: main
Backported-To: 20.09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants