Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Stale scikit-learn datasets can throw pickling errors #14328

Closed as not planned
Closed as not planned
@deniederhut

Description

@deniederhut

Description

Loading a dataset that has been downloaded to ~/scikit_learn_data some time in the past can throw unpickling errors after updating scikit-learn. Possible solutions might be:

  • clearing the dataset cache after updating
  • including the scikit-learn version used to serialize the dataset objects
  • serializing with something besides pickle

Deleting the data folder resolves this issue.

Steps/Code to Reproduce

python examples/impute/plot_iterative_imputer_variants_comparison.py

Expected Results

No error is thrown.

Actual Results

An unpickling error is thrown when trying to find the _joblib module in sklearn.externals.

Traceback (most recent call last):
  File "examples/impute/plot_iterative_imputer_variants_comparison.py", line 61, in <module>
    X_full, y_full = fetch_california_housing(return_X_y=True)
  File "/Users/dillon/githubpackages/scikit-learn/sklearn/datasets/california_housing.py", line 133, in fetch_california_housing
    cal_housing = _refresh_cache([filepath], 6)
  File "/Users/dillon/githubpackages/scikit-learn/sklearn/datasets/base.py", line 930, in _refresh_cache
    data = tuple([joblib.load(f) for f in files])
  File "/Users/dillon/githubpackages/scikit-learn/sklearn/datasets/base.py", line 930, in <listcomp>
    data = tuple([joblib.load(f) for f in files])
  File "/Users/dillon/anaconda/envs/sklearn-dev/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 598, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/Users/dillon/anaconda/envs/sklearn-dev/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 526, in _unpickle
    obj = unpickler.load()
  File "/Users/dillon/anaconda/envs/sklearn-dev/lib/python3.7/pickle.py", line 1085, in load
    dispatch[key[0]](self)
  File "/Users/dillon/anaconda/envs/sklearn-dev/lib/python3.7/pickle.py", line 1373, in load_global
    klass = self.find_class(module, name)
  File "/Users/dillon/anaconda/envs/sklearn-dev/lib/python3.7/pickle.py", line 1423, in find_class
    __import__(module, level=0)
ModuleNotFoundError: No module named 'sklearn.externals._joblib'

Versions

System:
    python: 3.7.3 (default, Mar 27 2019, 16:54:48)  [Clang 4.0.1 (tags/RELEASE_401/final)]
executable: /Users/dillon/anaconda/envs/sklearn-dev/bin/python
   machine: Darwin-17.7.0-x86_64-i386-64bit

BLAS:
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /Users/dillon/anaconda/envs/sklearn-dev/lib
cblas_libs: mkl_rt, pthread

Python deps:
       pip: 19.1.1
setuptools: 41.0.1
   sklearn: 0.22.dev0
     numpy: 1.16.4
     scipy: 1.2.1
    Cython: 0.29.11
    pandas: 0.24.2
matplotlib: 3.1.0
    joblib: 0.13.2

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions