-
-
Notifications
You must be signed in to change notification settings - Fork 26k
FIX Support F-contiguous arrays for PairwiseDistancesReductions
-backed estimators
#23990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Support F-contiguous arrays for PairwiseDistancesReductions
-backed estimators
#23990
Conversation
With this approach, an estimator that don't converts to C-contiguous array will fallback to the non-optimized way on fortran arrays. I think we should instead do the conversion at |
I removed the no changelog needed tag because I think this change does need a changelog entry |
Yes indeed, I think this fix is independent of changing the validation steps and can be addressed in other PRs. What do you think? Should we instead address all the changes in this PR? |
I'm ok to merge this fix first to not delay a bug fix release. But once all concerned estimators convert appropriately, this check will always be True. I think you can add a comment to remove this check when all estimators provide C contiguous arrays. |
PairwiseDistancesReductions
-backed estimators
Actually, I've done all the changes in this PR since many estimators are affected and since the changes are relatively straightforward. A really simple common test has been added, and we potential might want to extend tests for F-contiguous arrays in the future as its done in such generic tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a trade off in terms of memory between using the new pairwise backend and the old one? Specifically, the new backend requires C-ordered arrays which may create a copy, while the old backend can work with any array.
Yes, I think we should weight that properly (#23988 (comment)). If the provided F-contiguous array is large, this fix can make users' workflows impracticable as a copy of the array needs to be made. An alternative is to maintain F-contiguity (hence falling back to the old back-end in this case) but warn users that providing C-contiguous arrays allow getting the best performance, and document it properly somewhere. (What's a bit sad, is that `pandas.DataFrames.values' arrays generally are F-contiguous.) If needed, I am fine reverting the last changes in this PR to only properly fall-back on the old back-end for F-contiguous arrays before we choose which solution to pick. What are you thoughts and preferences? |
I answered your question in #23988 (comment) TLDR:
|
OK, so let's revert this PR to this simpler fix and potentially make arrays C-contiguous after cost analysis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @jjerphan |
You're welcome, @glemaitre. |
Reference Issues/PRs
Fixes #23988.
Fixes #24013.
What does this implement/fix? Explain your changes.
Only C-contiguous arrays are supported by
PairwiseDistancesReductions
.Yet, this is not specified and thus makes user-facing estimator failed
when used with F-contiguous.
This PR:
PairwiseDistancesReductions
specify that they only supportC-contiguous array