Thanks to visit codestin.com
Credit goes to github.com

Skip to content

fetch_openml can raise "PermissionError: [WinError 32] The process cannot access the file because it is being used by another process" #21798

Closed
@ogrisel

Description

@ogrisel

Describe the bug

On windows, if fetch_openml is run concurrently in 2 processes, for instance when running the test with pytest-xdist, one sometimes get errors such as:

[...]
monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x1439D0F0>
gzip_response = True

    @pytest.mark.parametrize("gzip_response", [True, False])
        version    = 'active'
C:\hostedtoolcache\windows\Python\3.7.9\x86\lib\site-packages\sklearn\datasets\_openml.py:449: in _get_data_description_by_id
    url, error_message, data_home=data_home
        data_home  = 'C:\\Users\\VssAdministrator\\scikit_learn_data\\openml'
        data_id    = 2
        error_message = 'Dataset with data_id 2 not found.'
        url        = 'api/v1/json/data/2'
C:\hostedtoolcache\windows\Python\3.7.9\x86\lib\site-packages\sklearn\datasets\_openml.py:172: in _get_json_content_from_openml_api
    return _load_json()
        _load_json = <function _get_json_content_from_openml_api.<locals>._load_json at 0x14167C00>
        data_home  = 'C:\\Users\\VssAdministrator\\scikit_learn_data\\openml'
        error_message = 'Dataset with data_id 2 not found.'
        url        = 'api/v1/json/data/2'
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = (), kw = {}
local_path = 'C:\\Users\\VssAdministrator\\scikit_learn_data\\openml\\openml.org\\api/v1/json/data/2.gz'

    @wraps(f)
    def wrapper(*args, **kw):
        if data_home is None:
            return f(*args, **kw)
        try:
            return f(*args, **kw)
        except HTTPError:
            raise
        except Exception:
            warn("Invalid cache, redownloading file", RuntimeWarning)
            local_path = _get_local_path(openml_path, data_home)
            if os.path.exists(local_path):
>               os.unlink(local_path)
E               PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\VssAdministrator\\scikit_learn_data\\openml\\openml.org\\api/v1/json/data/2.gz'

Full error log:

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=35377&view=logs&j=18b0749f-dd9a-5274-d197-77895e43d4e4&t=ba53dc33-2c0b-592b-6f69-b1c7af7ca977

Steps/Code to Reproduce

Run pytest -x -n 4 --pyargs sklearn many times.

Expected Results

No crash, the fetch_openml should be concurrent safe.

Actual Results

See error report above.

Versions

Python dependencies:
          pip: 21.3.1
   setuptools: 47.1.0
      sklearn: 1.1.dev0
        numpy: 1.21.4
        scipy: 1.7.3
       Cython: 0.29.24
       pandas: None
   matplotlib: None
       joblib: 1.1.0
threadpoolctl: 3.0.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: C:\hostedtoolcache\windows\Python\3.7.9\x86\lib\site-packages\numpy\.libs\libopenblas.VTYUM5MXKVFE4PZZER3L7PNO6YB4XFF3.gfortran-win32.dll
        version: 0.3.17
threading_layer: pthreads
   architecture: Nehalem
    num_threads: 2

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: C:\hostedtoolcache\windows\Python\3.7.9\x86\lib\site-packages\scipy\.libs\libopenblas.VTYUM5MXKVFE4PZZER3L7PNO6YB4XFF3.gfortran-win32.dll
        version: 0.3.17
threading_layer: pthreads
   architecture: Nehalem
    num_threads: 2

       user_api: openmp
   internal_api: openmp
         prefix: vcomp
       filepath: C:\Windows\SYSTEM32\VCOMP140.DLL
        version: None
    num_threads: 2

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions