Closed
Description
Describe the bug
scikit-learn crashes /w current scipy==1.9.2 on Win (AMD64). This combination causes fail of joblib subprocess using scikit-learn. (Tested in Windows containers / docker images python:3.9.13 AND python:3.9.10 .)
Not replicated on Linux and MacOS even with the same settings (vanilla official Python 3.9 and 3.10).
Previous bugfix scipy version works fine (1.9.1).
Steps/Code to Reproduce
In Windows (only) install at first these packages:
python -m pip install -U pip setuptools scipy==1.9.2 joblib scikit-learn
Then run this code (e.g. interactively):
from joblib import Parallel, delayed
import sklearn
def a():
from sklearn.model_selection import cross_val_score
return cross_val_score
data_results = Parallel(n_jobs=4)(delayed(a)() for i in range(10))
The last line (above) fails.
To fix, just do (e.g. below) and rerun.
python -m pip install -U pip scipy==1.9.1
Expected Results
... nothing # if the minimum example above runs OK, nothing is deisplayed.
Actual Results
In [5]: data_results = Parallel(n_jobs=4)(delayed(a)() for i in range(10))
---------------------------------------------------------------------------
_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Python\lib\site-packages\joblib\externals\loky\process_executor.py", line 428, in _process_worker
r = call_item()
File "C:\Python\lib\site-packages\joblib\externals\loky\process_executor.py", line 275, in __call__
return self.fn(*self.args, **self.kwargs)
File "C:\Python\lib\site-packages\joblib\_parallel_backends.py", line 620, in __call__
return self.func(*args, **kwargs)
File "C:\Python\lib\site-packages\joblib\parallel.py", line 288, in __call__
return [func(*args, **kwargs)
File "C:\Python\lib\site-packages\joblib\parallel.py", line 288, in <listcomp>
return [func(*args, **kwargs)
File "<ipython-input-1-bf041901e974>", line 2, in a
File "C:\Python\lib\site-packages\sklearn\model_selection\__init__.py", line 23, in <module>
from ._validation import cross_val_score
File "C:\Python\lib\site-packages\sklearn\model_selection\_validation.py", line 32, in <module>
from ..metrics import check_scoring
File "C:\Python\lib\site-packages\sklearn\metrics\__init__.py", line 41, in <module>
from . import cluster
File "C:\Python\lib\site-packages\sklearn\metrics\cluster\__init__.py", line 22, in <module>
from ._unsupervised import silhouette_samples
File "C:\Python\lib\site-packages\sklearn\metrics\cluster\_unsupervised.py", line 16, in <module>
from ..pairwise import pairwise_distances_chunked
File "C:\Python\lib\site-packages\sklearn\metrics\pairwise.py", line 33, in <module>
from ._pairwise_distances_reduction import PairwiseDistancesArgKmin
ImportError: DLL load failed while importing _pairwise_distances_reduction: The specified module could not be found.
"""
The above exception was the direct cause of the following exception:
ImportError Traceback (most recent call last)
Cell In [5], line 1
----> 1 data_results = Parallel(n_jobs=4)(delayed(a)() for i in range(10))
File C:\Python\lib\site-packages\joblib\parallel.py:1098, in Parallel.__call__(self, iterable)
1095 self._iterating = False
1097 with self._backend.retrieval_context():
-> 1098 self.retrieve()
1099 # Make sure that we get a last message telling us we are done
1100 elapsed_time = time.time() - self._start_time
File C:\Python\lib\site-packages\joblib\parallel.py:975, in Parallel.retrieve(self)
973 try:
974 if getattr(self._backend, 'supports_timeout', False):
--> 975 self._output.extend(job.get(timeout=self.timeout))
976 else:
977 self._output.extend(job.get())
File C:\Python\lib\site-packages\joblib\_parallel_backends.py:567, in LokyBackend.wrap_future_result(future, timeout)
564 """Wrapper for Future.result to implement the same behaviour as
565 AsyncResults.get from multiprocessing."""
566 try:
--> 567 return future.result(timeout=timeout)
568 except CfTimeoutError as e:
569 raise TimeoutError from e
File C:\Python\lib\concurrent\futures\_base.py:458, in Future.result(self, timeout)
456 raise CancelledError()
457 elif self._state == FINISHED:
--> 458 return self.__get_result()
459 else:
460 raise TimeoutError()
File C:\Python\lib\concurrent\futures\_base.py:403, in Future.__get_result(self)
401 if self._exception:
402 try:
--> 403 raise self._exception
404 finally:
405 # Break a reference cycle with the exception in self._exception
406 self = None
ImportError: DLL load failed while importing _pairwise_distances_reduction: The specified module could not be found.
Versions
System:
python: 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]
executable: C:\Python\python.exe
machine: Windows-10-10.0.17763-SP0
Python dependencies:
sklearn: 1.1.2
pip: 22.2.2
setuptools: 65.4.1
numpy: 1.23.4
scipy: 1.9.2
Cython: None
pandas: None
matplotlib: None
joblib: 1.2.0
threadpoolctl: 3.1.0
Built with OpenMP: True
threadpoolctl info:
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: C:\Python\Lib\site-packages\numpy\.libs\libopenblas.FB5AE2TYXYH2IJRDKGDGQ3XBKLKTF43H.gfortran-win_amd64.dll
version: 0.3.20
threading_layer: pthreads
architecture: Haswell
num_threads: 2
user_api: openmp
internal_api: openmp
prefix: vcomp
filepath: C:\Python\Lib\site-packages\sklearn\.libs\vcomp140.dll
version: None
num_threads: 2
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: C:\Python\Lib\site-packages\scipy\.libs\libopenblas.PZA5WNOTOH6FZLB2KBVKAURAKVTFSNNU.gfortran-win_amd64.dll
version: 0.3.18
threading_layer: pthreads
architecture: Haswell
num_threads: 2