Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jj10306
Copy link
Contributor

@jj10306 jj10306 commented Oct 15, 2025

Summary:

The current state of initProfilers leads to data races in several different places during kineto's initialization.

static void initProfilers() {
  if (!initialized) {
    libkineto::api().initProfilerIfRegistered(); // spawns thread that calls `updateBaseConfig`
    libkineto::api().configLoader().initBaseConfig(); // calls `updateBaseConfig`
    initialized = true;
    VLOG(0) << "libkineto profilers activated";
  }
}

The root-cause is libkineto::api().initProfilerIfRegistered() spawns a thread on updateConfigThread that periodically calls updateBaseConfig
while libkineto::api().configLoader().initBaseConfig() on the main thread also calls updateBaseConfig (this is basically all it does afaict)

this can lead to racing in the initialization of "singletons" in multiple spots, the most relevant I've seen so far:

  1. DaemonConfigLoader::getConfigClient() (cause of this reported crash
  2. ConfigLoader::daemonConfigLoader()

It appears initBaseConfig was added >4 years ago as a temporary workaround for an event profiler bug (ref).

Given that the event profiler is no longer supported (ref) we should be able to remove this.

Differential Revision: D84663094

Test Plan

> cat example.py
import torch
> KINETO_USE_DAEMON=1 python example.py
INFO:2025-10-14 23:39:49 1639604:1639604 init.cpp:139] Registering daemon config loader, cpuOnly =  0
INFO:2025-10-14 23:39:49 1639604:1639604 CuptiActivityProfiler.cpp:243] CUDA versions. CUPTI: 26; Runtime: 12080; Driver: 12080

Observe registration of process in dynolog's logs

I ran this 100 times and observed 0 crashes (previously was seeing crashes ~80-90% of the time)

Summary:
TIThe  current state of `initProfiler` leads to data races in several different places during kineto's initialization.
```
static void initProfilers() {
  if (!initialized) {
    libkineto::api().initProfilerIfRegistered();
    libkineto::api().configLoader().initBaseConfig();
    initialized = true;
    VLOG(0) << "libkineto profilers activated";
  }
}
```
The root-cause is `libkineto::api().initProfilerIfRegistered()` spawns a thread on `updateConfigThread` that periodically calls `updateBaseConfig`
while `libkineto::api().configLoader().initBaseConfig()` on the main thread also calls `updateBaseConfig` (this is basically all it does afaict)

this can lead to racing in the initialization of "singletons" in multiple spots, the most relevant I've seen so far:
1. `DaemonConfigLoader::getConfigClient()` (cause of [this reported crash](pytorch/pytorch#163545)
2. `ConfigLoader::daemonConfigLoader()`


It appears `initBaseConfig` was added >4 years ago as a temporary workaround for an event profiler bug ([ref](pytorch@61a7083)). 

Given that the event profiler is no longer supported ([ref](pytorch@61a7083)) we should be able to remove this.

Differential Revision: D84663094
@meta-cla meta-cla bot added the cla signed label Oct 15, 2025
@meta-codesync
Copy link

meta-codesync bot commented Oct 15, 2025

@jj10306 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D84663094.

@jj10306
Copy link
Contributor Author

jj10306 commented Oct 15, 2025

cc: @sanrise @sraikund16 @briancoutinho @staugust from discussion on pytorch/pytorch#163545

@jj10306 jj10306 changed the title Fix racy OSS initialization Fix racy initialization with KINETO_USE_DAEMON=1 Oct 15, 2025
@meta-codesync
Copy link

meta-codesync bot commented Oct 17, 2025

This pull request has been merged in 6fcbc53.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants