-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Permutation Importance fails if dataset is large enough #15810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for reporting this issue @andersbogsnes ! |
do we have a fix for this? |
As I mentioned in the report, one potential fix is to turn off the memmapping by setting max_ nbytes to None. This would fix it, but I don’t know what the wider-ranting consequences are. Alternatively, the implementation would have to assign to a copy of the DataFrame on each pass - which sounds expensive memory-wise... Not sure why the difference between Numpy and the DataFrame though - that could be a clue. Finally, if there is some method that allows the user to reach into Parallel and choose to turn off memmapping, that would work too - unfortunately I haven’t found a way to do it in my cursory read through of the joblib docs - the |
Happy to see a patch that turns off max_nbytes for now. |
Would like to take this up. |
Let me know, otherwise I’d be happy to contribute a patch |
I tried to to turn off the memmapping by setting max_ nbytes to None but it didn't work. |
Uh oh!
There was an error while loading. Please reload this page.
Description
When using
permutation_importance
with a large enough pandas DataFrame and n_jobs > 0, joblib switches to read-only memmap mode, which proceeds to raise, aspermutation_importance
tries to assign to the DataFrame.The error does not occur when passing a similarly sized Numpy array.
In previous, similar implementations, we fixed the bug by setting
max_nbytes
toNone
in theParallel
init, though I don't know what the broader consequences of that are.Steps/Code to Reproduce
Expected Results
We expect no exception to be raised
Actual Results
Versions
System:
python: 3.7.4 (default, Oct 14 2019, 12:42:45) [GCC 7.4.0]
executable: /home/anders/.pyenv/versions/3.7.4/envs/ml_tooling_env/bin/python3.7
machine: Linux-5.0.0-37-generic-x86_64-with-debian-buster-sid
Python dependencies:
pip: 19.3.1
setuptools: 40.8.0
sklearn: 0.22
numpy: 1.17.4
scipy: 1.3.3
Cython: None
pandas: 0.25.3
matplotlib: 3.1.2
joblib: 0.14.0
Built with OpenMP: True
The text was updated successfully, but these errors were encountered: