-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH add n_jobs to mutual_info_regression and mutual_info_classif #28085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…earn into mutual_info_parallel
We will need an entry in |
Accept suggestions Co-authored-by: Guillaume Lemaitre <[email protected]>
…earn into mutual_info_parallel
@glemaitre thanks for the suggestions. I have implemented them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It almost look good. Here are a couple of suggestions.
Co-authored-by: Guillaume Lemaitre <[email protected]>
@glemaitre i have made the suggestions, thanks. One of the circleci tests is failing, but i don't know why. |
@@ -252,3 +253,18 @@ def test_mutual_info_regression_X_int_dtype(global_random_seed): | |||
expected = mutual_info_regression(X_float, y, random_state=global_random_seed) | |||
result = mutual_info_regression(X, y, random_state=global_random_seed) | |||
assert_allclose(result, expected) | |||
|
|||
|
|||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@netomenoci I push a piece of code that show how to make the parallelization if you are interested in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi bary Wery good nice good luck bary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We will need a second review.
It was because of a git conflict. |
Here is a quick benchmark, loosely adapted from #27795 (comment) (which unless I am mistaken was not really a benchmark of parallelizing I get 2.5 max speed improvement on my laptop, that has 8 logical cores (4 physical cores + hyper-threading). from sklearn.datasets import make_sparse_uncorrelated
from sklearn.feature_selection import mutual_info_regression
n_features = 100
n_samples = int(1e4)
X, y = make_sparse_uncorrelated(random_state=0, n_features=n_features, n_samples=n_samples)
print('n_jobs=1')
%timeit mutual_info_regression(X, y, n_jobs=1)
print('n_jobs=4')
%timeit mutual_info_regression(X, y, n_jobs=4)
print('n_jobs=8')
%timeit mutual_info_regression(X, y, n_jobs=8) I get:
|
@@ -201,11 +202,13 @@ def _iterate_columns(X, columns=None): | |||
def _estimate_mi( | |||
X, | |||
y, | |||
*, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used keyword arguments to make the code more readable in the _estimate_mi
function call and also use keyword-only arguments in the _estimate_mi
definition.
I guess that's fine since _estimate_mi
is private. @glemaitre do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I do agree.
Let's merge this one, thanks @netomenoci! |
Using sklearn wrapper of parallel slows it down because there is some extra overhead on top of joblib (eg copies) If u think it's useful , we can change (1 line change) the import of Parallel to joblib instead of sklearn utils and it should be faster and closer to the previous mentioned benchmark. |
We cannot do that because we will have issue with the global config. This is the reason to have the wrapper. |
…kit-learn#28085) Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Loïc Estève <[email protected]>
Reference Issues/PRs
Feat: #27795 (comment)
What does this implement/fix? Explain your changes.
This implement the addition of parameter n_jobs to sklearn.feature_selection.mutual_info_regression and sklearn.feature_selection.mutual_info_classif