Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Read only buffer in cross_val_score with sparse matrix. #25333

Closed
@tomMoral

Description

@tomMoral

Describe the bug

When calling cross_val_score with a sparse data matrix X and a RandomForestClassifier with n_jobs=-1, there is a weird interaction with joblib and memmapping that makes the buffer from X read-only, breaking the cython code for the tree construction but it is weird as it only appears with the cross_validate function, and not when calling the classifier alone, while n_jobs=1 for the cross val function so joblib should not enter the play here...

Steps/Code to Reproduce

from scipy.sparse import csr_matrix
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

X, y = make_classification(10000, n_features=200)
X = csr_matrix(X, copy=True)

clf = RandomForestClassifier(n_jobs=-1)

cross_val_score(clf, X, y)

Expected Results

Working code

Actual Results

ValueError: 
All the 5 fits failed.
It is very likely that your model is misconfigured.
You can try to debug the error by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
5 fits failed with the following error:
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/temp/.local/miniconda/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 428, in _process_worker
    r = call_item()
  File "/home/temp/.local/miniconda/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 275, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/home/temp/.local/miniconda/lib/python3.10/site-packages/joblib/_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
  File "/home/temp/.local/miniconda/lib/python3.10/site-packages/joblib/parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "/home/temp/.local/miniconda/lib/python3.10/site-packages/joblib/parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
  File "/home/temp/.local/miniconda/lib/python3.10/site-packages/sklearn/utils/fixes.py", line 117, in __call__
    return self.function(*args, **kwargs)
  File "/home/temp/.local/miniconda/lib/python3.10/site-packages/sklearn/ensemble/_forest.py", line 185, in _parallel_build_trees
    tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
  File "/home/temp/.local/miniconda/lib/python3.10/site-packages/sklearn/tree/_classes.py", line 889, in fit
    super().fit(
  File "/home/temp/.local/miniconda/lib/python3.10/site-packages/sklearn/tree/_classes.py", line 379, in fit
    builder.build(self.tree_, X, y, sample_weight)
  File "sklearn/tree/_tree.pyx", line 147, in sklearn.tree._tree.DepthFirstTreeBuilder.build
  File "sklearn/tree/_tree.pyx", line 173, in sklearn.tree._tree.DepthFirstTreeBuilder.build
  File "sklearn/tree/_splitter.pyx", line 789, in sklearn.tree._splitter.BaseSparseSplitter.init
  File "stringsource", line 660, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 350, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/temp/.local/miniconda/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/temp/.local/miniconda/lib/python3.10/site-packages/sklearn/ensemble/_forest.py", line 474, in fit
    trees = Parallel(
  File "/home/temp/.local/miniconda/lib/python3.10/site-packages/joblib/parallel.py", line 1098, in __call__
    self.retrieve()
  File "/home/temp/.local/miniconda/lib/python3.10/site-packages/joblib/parallel.py", line 975, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/temp/.local/miniconda/lib/python3.10/site-packages/joblib/_parallel_backends.py", line 567, in wrap_future_result
    return future.result(timeout=timeout)
  File "/home/temp/.local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/home/temp/.local/miniconda/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
ValueError: buffer source array is read-only

Versions

System:
    python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:36:39) [GCC 10.4.0]
executable: /home/temp/.local/miniconda/bin/python3.10
   machine: Linux-5.14.0-1054-oem-x86_64-with-glibc2.31

Python dependencies:
      sklearn: 1.2.0
          pip: 22.3
   setuptools: 65.5.0
        numpy: 1.23.5
        scipy: 1.9.3
       Cython: 0.29.32
       pandas: 1.5.2
   matplotlib: 3.6.2
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: mkl
         prefix: libmkl_rt
       filepath: /home/temp/.local/miniconda/lib/libmkl_rt.so.2
        version: 2022.1-Product
threading_layer: intel
    num_threads: 4

       user_api: openmp
   internal_api: openmp
         prefix: libomp
       filepath: /home/temp/.local/miniconda/lib/libomp.so
        version: None
    num_threads: 8

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions