Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Aug 2, 2023. It is now read-only.

Conversation

@achimnol
Copy link
Member

No description provided.

@achimnol achimnol added this to the 20.09 milestone Apr 26, 2021
@codecov
Copy link

codecov bot commented Apr 26, 2021

Codecov Report

Merging #425 (1a76611) into main (cc6a37e) will increase coverage by 0.03%.
The diff coverage is 25.00%.

❗ Current head 1a76611 differs from pull request most recent head 9c0cb6a. Consider uploading reports for the commit 9c0cb6a to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #425      +/-   ##
==========================================
+ Coverage   48.80%   48.83%   +0.03%     
==========================================
  Files          52       52              
  Lines        8364     8442      +78     
==========================================
+ Hits         4082     4123      +41     
- Misses       4282     4319      +37     
Impacted Files Coverage Δ
src/ai/backend/manager/idle.py 42.07% <0.00%> (ø)
src/ai/backend/manager/scheduler/predicates.py 29.41% <8.33%> (+0.49%) ⬆️
src/ai/backend/manager/scheduler/dispatcher.py 19.93% <8.77%> (-0.32%) ⬇️
src/ai/backend/manager/registry.py 17.59% <17.47%> (+0.16%) ⬆️
src/ai/backend/manager/models/scaling_group.py 66.37% <20.00%> (+0.14%) ⬆️
src/ai/backend/manager/api/auth.py 49.86% <23.07%> (-1.23%) ⬇️
src/ai/backend/manager/models/kernel.py 45.73% <25.00%> (+0.01%) ⬆️
src/ai/backend/manager/models/agent.py 50.35% <33.33%> (+0.35%) ⬆️
src/ai/backend/manager/models/keypair.py 48.04% <33.33%> (+0.20%) ⬆️
src/ai/backend/manager/models/utils.py 46.75% <47.54%> (+1.75%) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cc6a37e...9c0cb6a. Read the comment docs.

achimnol added 25 commits April 26, 2021 18:15
* SCHEDULED: the kernel occupies agent resources on DB but the actual container is not created yet
* Use pg_advisory_lock to protect critical sections in the dispatcher module.
  - TODO: we need a mechanism to coalesce burst firing of DoScheduleEvent and
    DoPrepareEvent, to prevent exhaustion of the DB connection pool!
* Use the "REPEATABLE READ" transaction isolation level by default,
  and retry update queries as necessary.
* Remove several duplicate kernels.status updates.
@achimnol achimnol marked this pull request as ready for review May 10, 2021 07:22
@achimnol achimnol merged commit 05f21a0 into main May 10, 2021
@achimnol achimnol deleted the fix/potential-transaction-hangs branch May 10, 2021 08:33
achimnol added a commit that referenced this pull request May 10, 2021
* Now all DB transactions use the "SERIALIZABLE" isolation level with explicit retries.
* Now DB transactions that includes only SELECT queries are marked as "read-only" so that
  the PostgreSQL engine could optimize concurrent access with the new isolation level.
  All future codes should use `beegin_readonly()` method from our own subclassed SQLAlchemy
  engine instance replacing all existing `db` context variables.
* Remove excessive database updates due to keypair API query counts and kernel API query counts.
  The keypair API query count is re-written to use Redis with one month retention. (#421)
  Now just calling an API does not trigger updates in the PostgreSQL database.
* Fix unnecessary database updates for agent heartbeats.
* Split many update-only DB transactions into smaller units, such as resource recalculation.
* Use PostgreSQL advisory locks to make the scheduling decision process as a critical section.
* Fix some of variable binding issues with nested functions inside loops.
* Apply event message coalescing to prevent event bursts (e.g., `DoScheduleEvent` fired after
  enqueueing new session requests) which hurts the database performance and potentially
  break the transaction isolation guarantees.

Backported-From: main
Backported-To: 21.03
achimnol added a commit that referenced this pull request May 10, 2021
* Now all DB transactions use the "SERIALIZABLE" isolation level with explicit retries.
* Now DB transactions that includes only SELECT queries are marked as "read-only" so that
  the PostgreSQL engine could optimize concurrent access with the new isolation level.
  All future codes should use `beegin_readonly()` method from our own subclassed SQLAlchemy
  engine instance replacing all existing `db` context variables.
* Remove excessive database updates due to keypair API query counts and kernel API query counts.
  The keypair API query count is re-written to use Redis with one month retention. (#421)
  Now just calling an API does not trigger updates in the PostgreSQL database.
* Fix unnecessary database updates for agent heartbeats.
* Split many update-only DB transactions into smaller units, such as resource recalculation.
* Use PostgreSQL advisory locks to make the scheduling decision process as a critical section.
* Fix some of variable binding issues with nested functions inside loops.
* Apply event message coalescing to prevent event bursts (e.g., `DoScheduleEvent` fired after
  enqueueing new session requests) which hurts the database performance and potentially
  break the transaction isolation guarantees.

Backported-From: main
Backported-To: 20.09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants