-
Notifications
You must be signed in to change notification settings - Fork 24.1k
Segmentation error for torch==2.2.1 on MacOs #121101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is this reproducible if one uses Apple Silicon M1 runners? (Though Torch-2.2 is the last release to support Intel Macs per #114602 ) At least I can not reproduce it on M1, trying it in x86 Rosetta mode.
Nor can I repro in GitHub CI: https://github.com/malfet/deleteme/actions/runs/8150940508/job/22278030319?pr=79 |
I can reproduce in GitHub CI (over in the shap repo) with a slightly different setup: I'll see if I can identify what the relevant difference is between that job and your run above- perhaps it's related to having different dependencies installed. |
Reproducing the issue on GitHub ActionsI can reproduce the minimal reproducible example above on GitHub Actions, with the environment below. The test snippet passes in an environment created with The examples below ran on GitHub Actions with Reproducible exampleAs above: import time
import torch
from sklearn.datasets import fetch_california_housing
def test_something():
X, y = fetch_california_housing(return_X_y=True)
torch.tensor(X)
time.sleep(3) Passing runExample passing run: https://github.com/shap/shap/actions/runs/8248044359/job/22557508223
Failing runExample failing run: https://github.com/shap/shap/actions/runs/8248015803/job/22557423230
|
any news on that issue ? We are having the same problem. |
Over at the "shap" project we are still seeing issue on CI, and it's preventing us from testing against the latest pytorch on MacOS. Example failing run here. We still see the issue with @malfet to help the investigation progress, here's a full minimal GitHub Actions workflow to reproduce the error: # run_tests.yml
jobs:
run_tests:
runs-on: macos-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: 3.11
- run: brew install libomp
- run: pip install pytest torch scikit-learn lightgbm
- run: pip list
- run: pytest --noconftest test_bug.py # test_bug.py
import time
import lightgbm
import torch
from sklearn.datasets import fetch_california_housing
def test_something():
X, y = fetch_california_housing(return_X_y=True)
torch.tensor(X)
time.sleep(3) Leads to
Result of
|
@connortann thank you for the reproducer. Crash is due to multiple OpenMP runtimes loaded into the process address space:
|
If I comment out the Full traceback if
|
To be frank, I'm unsure if problem lies solely with PyTorch at this point, as two other runtimes are importing libomp, and there isn't much one can do short of disabling OpenMP (which one can do programmatically by calling |
@connortann can you please try adding |
Yep certainly: the tests do indeed pass with
Indeed, as the segfault only to occurs when lightgbm is imported first. Possibly relevant, we had a separate segfault issue when torch is imported before lightgbm, as described in this comment: shap/shap#3092 (comment) I hope that collectively we can find a fix; as torch and lightgbm are both extremely popular libraries so it's quite common that they will be installed in the same environment. |
I cross-posted to LightGBM, because as you say the problem doesn't seem to lie soley with pytorch: microsoft/LightGBM#6595 |
I'm going to add that this pytorch segmentation fault on macos do not necessarily need LightGBM. Some others like vapoursynth can cause the same problem. |
As this issue requires a community effort, it is maybe best to centralize the discussion. |
I am having this problem as well. My objective is to run https://github.com/black-forest-labs/flux demo with PyTorch 2.4.1 on Intel MacBook Pro's Radeon 5500M. What I've done so far:
After all that the segfault wouldn't go away. I'm ready to dig into the issue, but I need some guidance/fresh ideas to facilitate the investigation. |
@gchanan @dzhulgakov @ezyang @malfet If you could have a look and participate in the discussion in microsoft/LightGBM#6595, that would be highly appreciated. I consider those kinds of bugs among the worst for users. This issue is mainly caused by pytorch, the short summary of microsoft/LightGBM#6595 (comment) is:
|
Any progress here? With python3.12 the error seems to be thrown with torch |
π Describe the bug
At shap, we have run into problems with our CI jobs on macOs, e.g. see here. I tracked this down to an issue with
torch==2.2.1
.Here is code to reproduce the issue (this works on
torch==2.2.0
):(execute with
python -m pytest <filename>
)Stacktrace:
Versions
cc @malfet @albanD @frank-wei @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10
The text was updated successfully, but these errors were encountered: