Closed
Description
Describe the bug
On windows, if fetch_openml
is run concurrently in 2 processes, for instance when running the test with pytest-xdist, one sometimes get errors such as:
[...]
monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x1439D0F0>
gzip_response = True
@pytest.mark.parametrize("gzip_response", [True, False])
version = 'active'
C:\hostedtoolcache\windows\Python\3.7.9\x86\lib\site-packages\sklearn\datasets\_openml.py:449: in _get_data_description_by_id
url, error_message, data_home=data_home
data_home = 'C:\\Users\\VssAdministrator\\scikit_learn_data\\openml'
data_id = 2
error_message = 'Dataset with data_id 2 not found.'
url = 'api/v1/json/data/2'
C:\hostedtoolcache\windows\Python\3.7.9\x86\lib\site-packages\sklearn\datasets\_openml.py:172: in _get_json_content_from_openml_api
return _load_json()
_load_json = <function _get_json_content_from_openml_api.<locals>._load_json at 0x14167C00>
data_home = 'C:\\Users\\VssAdministrator\\scikit_learn_data\\openml'
error_message = 'Dataset with data_id 2 not found.'
url = 'api/v1/json/data/2'
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
args = (), kw = {}
local_path = 'C:\\Users\\VssAdministrator\\scikit_learn_data\\openml\\openml.org\\api/v1/json/data/2.gz'
@wraps(f)
def wrapper(*args, **kw):
if data_home is None:
return f(*args, **kw)
try:
return f(*args, **kw)
except HTTPError:
raise
except Exception:
warn("Invalid cache, redownloading file", RuntimeWarning)
local_path = _get_local_path(openml_path, data_home)
if os.path.exists(local_path):
> os.unlink(local_path)
E PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\VssAdministrator\\scikit_learn_data\\openml\\openml.org\\api/v1/json/data/2.gz'
Full error log:
Steps/Code to Reproduce
Run pytest -x -n 4 --pyargs sklearn
many times.
Expected Results
No crash, the fetch_openml
should be concurrent safe.
Actual Results
See error report above.
Versions
Python dependencies:
pip: 21.3.1
setuptools: 47.1.0
sklearn: 1.1.dev0
numpy: 1.21.4
scipy: 1.7.3
Cython: 0.29.24
pandas: None
matplotlib: None
joblib: 1.1.0
threadpoolctl: 3.0.0
Built with OpenMP: True
threadpoolctl info:
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: C:\hostedtoolcache\windows\Python\3.7.9\x86\lib\site-packages\numpy\.libs\libopenblas.VTYUM5MXKVFE4PZZER3L7PNO6YB4XFF3.gfortran-win32.dll
version: 0.3.17
threading_layer: pthreads
architecture: Nehalem
num_threads: 2
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: C:\hostedtoolcache\windows\Python\3.7.9\x86\lib\site-packages\scipy\.libs\libopenblas.VTYUM5MXKVFE4PZZER3L7PNO6YB4XFF3.gfortran-win32.dll
version: 0.3.17
threading_layer: pthreads
architecture: Nehalem
num_threads: 2
user_api: openmp
internal_api: openmp
prefix: vcomp
filepath: C:\Windows\SYSTEM32\VCOMP140.DLL
version: None
num_threads: 2