Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@phueper
Copy link
Contributor

@phueper phueper commented Nov 7, 2025

to wait and notify when signal is caught

fixes #925

(cherry picked from commit 90be483)

Important

  1. We strictly follow a issue-first approach, please first open an issue relating to this Pull Request.
  2. PR name follows conventional commit format: feat: ... or fix: ....

#925:

Change description:

instead of burning 100% CPU in a endless loop, use threading.Event to wait and notify when signal is caught

Checklist

If your change doesn't seem to apply, please leave them unchecked.

  • PR name follows conventional commit format: feat: ... or fix: ....
  • I have reviewed the contributing guidelines
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • I have performed a self-review of this change
  • Changes have been tested
  • Changes are documented

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

Summary by Sourcery

Implement efficient shutdown signaling in the GPU collector and optimize its Docker image build

Bug Fixes:

  • Replace busy-wait infinite loop with threading.Event.wait to eliminate 100% CPU spin on termination

Enhancements:

  • Update Dockerfile to use python:3-slim base, add build-essential and AMD SMI libraries, simplify pip install commands, and relocate source COPY for more efficient builds

@phueper phueper requested a review from a team as a code owner November 7, 2025 08:53
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Nov 7, 2025

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

This PR replaces a CPU-intensive busy-wait loop with threading.Event for graceful signal handling and updates the otel-gpu-collector Dockerfile to use a newer base image, install additional dependencies, reorder build steps, and simplify package installation.

Sequence diagram for signal handling with threading.Event

sequenceDiagram
    participant Collector as "collector.py"
    participant OS as "Operating System"
    participant Logger
    Collector->>OS: Wait for termination signal
    OS-->>Collector: Send termination signal (SIGINT/SIGTERM)
    Collector->>Logger: Log "Received termination signal"
    Collector->>Collector: Set keep_running Event
    Collector->>Collector: keep_running.wait() returns, script exits
Loading

Class diagram for signal handling logic update

classDiagram
    class collector_py {
        - keep_running: Event
        + signal_handler(sig, frame)
        + main()
    }
    collector_py : signal_handler sets keep_running Event
    collector_py : main() waits on keep_running Event
Loading

File-Level Changes

Change Details Files
Replace CPU-intensive infinite loop with threading.Event for proper waiting on signals
  • Converted keep_running boolean flag to threading.Event
  • Updated signal_handler to call keep_running.set() to notify shutdown
  • Replaced busy-wait while keep_running: pass with keep_running.wait()
otel-gpu-collector/collector.py
Refine Dockerfile base image, dependencies, and installation steps
  • Switched base image from python:3.9-slim to python:3-slim
  • Added build-essential, amd-smi, and libamd-smi-dev to apt-get dependencies
  • Moved COPY . . after pip install to improve layer caching
  • Simplified pip install by removing version pins for psutil, nvidia-ml-py, and openlit
otel-gpu-collector/dockerfile

Assessment against linked issues

Issue Objective Addressed Explanation
#925 Prevent the GPU Collector from burning 100% CPU in idle state by removing the non-interrupted endless loop.
#925 Ensure the GPU Collector only uses CPU when actually collecting GPU data, and remains idle otherwise.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • In the Dockerfile, pin your Python dependencies (or use a requirements.txt) instead of installing latest for reproducible builds and to avoid unexpected breakage.
  • Reorder your Dockerfile to leverage layer caching: COPY only your requirements first, run pip install, then COPY the rest of your source.
  • In collector.py, explicitly import threading.Event at the top, consider renaming keep_running to something like stop_event, and drop the unnecessary global since Event is mutable.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the Dockerfile, pin your Python dependencies (or use a requirements.txt) instead of installing latest for reproducible builds and to avoid unexpected breakage.
- Reorder your Dockerfile to leverage layer caching: COPY only your requirements first, run pip install, then COPY the rest of your source.
- In collector.py, explicitly import threading.Event at the top, consider renaming `keep_running` to something like `stop_event`, and drop the unnecessary `global` since Event is mutable.

## Individual Comments

### Comment 1
<location> `otel-gpu-collector/dockerfile:8-17` </location>
<code_context>
 # Install build dependencies and necessary libraries
 RUN apt-get update && apt-get install -y --no-install-recommends \
     gcc \
+    build-essential \
     python3-dev \
     libffi-dev \
</code_context>

<issue_to_address>
**suggestion:** Evaluate necessity of both 'gcc' and 'build-essential' in the image.

Consider removing 'gcc' to avoid redundancy and minimize image size and installation time.

```suggestion
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    python3-dev \
    libffi-dev \
    libssl-dev \
    amd-smi \
    libamd-smi-dev \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

… wait and notify when signal is caught

(cherry picked from commit 90be483)
@phueper phueper force-pushed the update_gpu_collector branch from b1bb881 to 08b42ec Compare November 7, 2025 09:04
@patcher9 patcher9 merged commit bb7e5e1 into openlit:main Nov 7, 2025
1 check passed
@patcher9
Copy link
Contributor

patcher9 commented Nov 7, 2025

Thanks @phueper for the fix!

@phueper phueper deleted the update_gpu_collector branch November 7, 2025 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: GPU collector burns 100% CPU

2 participants