HistGradientBoostingRegressor is slower when torch not imported

### Describe the bug

This is perhaps not a bug but an opportunity for improvement. I've noticed that scikit-learn runs considerably faster if I happen to have `import torch` before any `sklearn` imports.

This first block of code runs much slower:

```py
from sklearn.ensemble import HistGradientBoostingRegressor
import numpy as np


X = np.random.random(size=(50, 10000))
y = np.random.random(size=50)

estimator = HistGradientBoostingRegressor(verbose=True)
estimator.fit(X, y)
```
Than this second block of code:

```py
import torch  # The only difference
from sklearn.ensemble import HistGradientBoostingRegressor
import numpy as np


X = np.random.random(size=(50, 10000))
y = np.random.random(size=50)

estimator = HistGradientBoostingRegressor(verbose=True)
estimator.fit(X, y)
```

Here's the run times over 6 runs each on my actual code, the only difference being an import of `torch`
![image](https://github.com/scikit-learn/scikit-learn/assets/4443482/880efe8b-0f19-44ba-b680-79b193fb3166)


I know it's confusing that I'm importing `torch` but not using it, so to be clear, **I don't use the torch module in any way on the page**. I just happened to stumble across the performance improvement at one point when I imported `torch` for some other purpose. It's literally just sitting there as an 'unused import' making my code run much faster.

I've tested  with a few other regressors, including `RandomForestRegressor` and `GradientBoostingRegressor` and I don't see any difference.

I compared `os.environ` in both cases and they're the same. I looked at `sklearn.base.get_config()` and they're identical in both cases too. I notice that torch sets `OMP_NUM_THREADS` to 10, while without the torch import this value is set to `20` (on my machine with 20 cores). But even manually setting this to 10 doesn't bridge the gap.

I don't know enough about `torch` or `sklearn` to be able to work out what else is going on, I'm guessing someone who's worked on `HistGradientBoostingRegressor` might know what's going on? Seems like there's a nice performance gain to be found somewhere in here.

### Steps/Code to Reproduce

As above

### Expected Results

Should be max fast all the time.

### Actual Results

Is not max fast unless I import torch.

Also as a general thing it would be nice to be able to pass `n_jobs` to the constructor. Having something use all 20 cores is not always the fastest way.

### Versions

```shell
System:
    python: 3.10.8 (main, Oct 12 2022, 19:14:26) [GCC 9.4.0]
executable: /home/davidg/.virtualenvs/learning/bin/python
   machine: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
Python dependencies:
      sklearn: 1.2.2
          pip: 23.1.2
   setuptools: 59.5.0
        numpy: 1.24.3
        scipy: 1.10.1
       Cython: 0.29.33
       pandas: 2.0.1
   matplotlib: 3.7.0
       joblib: 1.2.0
threadpoolctl: 3.1.0
Built with OpenMP: True
threadpoolctl info:
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/davidg/.virtualenvs/learning/lib/python3.10/site-packages/numpy.libs/libopenblas64_p-r0-15028c96.3.21.so
        version: 0.3.21
threading_layer: pthreads
   architecture: Haswell
    num_threads: 20
       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /home/davidg/.virtualenvs/learning/lib/python3.10/site-packages/torch/lib/libgomp-a34b3233.so.1
        version: None
    num_threads: 10
       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /home/davidg/.virtualenvs/learning/lib/python3.10/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None
    num_threads: 20
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/davidg/.virtualenvs/learning/lib/python3.10/site-packages/scipy.libs/libopenblasp-r0-41284840.3.18.so
        version: 0.3.18
threading_layer: pthreads
   architecture: Haswell
    num_threads: 20
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

HistGradientBoostingRegressor is slower when torch not imported #26752

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

HistGradientBoostingRegressor is slower when torch not imported #26752

Description

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions