Closed
Description
Describe the bug
Hi,
When training RandomForestClassifier using multiple cores (n_jobs=-1) I get the following error (full traceback below):
ValueError: buffer source array is read-only
This doesn't happen when using just one core or when a small dataset is used for training (by subsampling a large one).
It's not straightforward to provide reproducible code as this happens with a fairly large dataset (~100K training records).
The code is running on a MacBook Pro (6-Core Intel Core i7, Monterey 12.5.1) under Python 3.9 (see version info below).
Note: A similar error is mentioned in bug reports #15851 and #16331 - but it appears this issue has not been fully fixed.
Thanks,
Ron
Steps/Code to Reproduce
Here are the relevant lines of code:
clf = Pipeline([
('tfidf', TfidfVectorizer(ngram_range=(1, 1),
use_idf=True,
max_df=1.0,
max_features=None
)
),
('chi2p', SelectPercentile(chi2, percentile=100)),
('clf', CalibratedClassifierCV(RandomForestClassifier(random_state=None,
max_depth=50,
class_weight='balanced',
n_jobs=-1
)
)
)
])
clf.fit(data_train, target_train)
Expected Results
No error is thrown.
Actual Results
Traceback (most recent call last):
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/DRG/sklearn_sgdclassifier_autocoder.py", line 365, in <module>
clf_drg_code = trainClassifier(df_train, target_column, num_estimators, model_filename)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/DRG/sklearn_sgdclassifier_autocoder.py", line 233, in trainClassifier
clf = ensembleClassifier(df_train, target_column, num_estimators)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/DRG/sklearn_sgdclassifier_autocoder.py", line 155, in ensembleClassifier
clf.fit(data_train, target_train)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/sklearn/ensemble/_voting.py", line 347, in fit
return super().fit(X, transformed_y, sample_weight)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/sklearn/ensemble/_voting.py", line 83, in fit
self.estimators_ = Parallel(n_jobs=self.n_jobs)(
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/joblib/parallel.py", line 1085, in __call__
if self.dispatch_one_batch(iterator):
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/joblib/parallel.py", line 901, in dispatch_one_batch
self._dispatch(tasks)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/joblib/parallel.py", line 819, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 597, in __init__
self.results = batch()
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/joblib/parallel.py", line 288, in __call__
return [func(*args, **kwargs)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/joblib/parallel.py", line 288, in <listcomp>
return [func(*args, **kwargs)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/sklearn/ensemble/_base.py", line 46, in _fit_single_estimator
estimator.fit(X, y)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/sklearn/pipeline.py", line 406, in fit
self._final_estimator.fit(Xt, y, **fit_params_last_step)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/sklearn/calibration.py", line 396, in fit
self.calibrated_classifiers_ = parallel(
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/joblib/parallel.py", line 1085, in __call__
if self.dispatch_one_batch(iterator):
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/joblib/parallel.py", line 901, in dispatch_one_batch
self._dispatch(tasks)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/joblib/parallel.py", line 819, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 597, in __init__
self.results = batch()
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/joblib/parallel.py", line 288, in __call__
return [func(*args, **kwargs)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/joblib/parallel.py", line 288, in <listcomp>
return [func(*args, **kwargs)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/sklearn/calibration.py", line 578, in _fit_classifier_calibrator_pair
estimator.fit(X_train, y_train, **fit_params_train)
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/sklearn/ensemble/_forest.py", line 474, in fit
trees = Parallel(
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/joblib/parallel.py", line 1098, in __call__
self.retrieve()
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/joblib/parallel.py", line 975, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/Users/ron.katriel/PycharmProjects/Classifier/COST/venv/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 567, in wrap_future_result
return future.result(timeout=timeout)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 440, in result
return self.__get_result()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
ValueError: buffer source array is read-only
Process finished with exit code 1
Versions
System:
python: 3.9.0 (v3.9.0:9cf6752276, Oct 5 2020, 11:29:23) [Clang 6.0 (clang-600.0.57)]
executable: /Users/ron.katriel/PycharmProjects/Classifier/COST/venv/bin/python
machine: macOS-10.16-x86_64-i386-64bit
Python dependencies:
sklearn: 1.2.0
pip: 22.3.1
setuptools: 65.6.3
numpy: 1.23.3
scipy: 1.9.1
Cython: None
pandas: 1.5.2
matplotlib: 3.5.3
joblib: 1.2.0
threadpoolctl: 3.1.0
Built with OpenMP: True