Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH Pairwise Kernels Cython Routines #21218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 235 commits into from

Conversation

jjerphan
Copy link
Member

@jjerphan jjerphan commented Oct 1, 2021

Reference Issues/PRs

Comes after #20254.

What does this implement/fix? Explain your changes.

Build pairwise kernels' computations on top of PairwiseDistancesReduction's template.

Any other comments?

Still WIP and the class hierarchy might be redesign.

jjerphan added 30 commits June 11, 2021 18:47
Also perform some renaming.
As this test aims at identifying contexts of numerical
instabilities, parametrising on more parameters make
sense.

Hypothesis:

Numerical stability is influenced by:

 - the range of the data (given by ``translation``)
 - the number of dimensions (given by ``d``)
 - the ``chunk_size``

But not by:
 - the parallelisation ``strategy``
 - the number of neighbors (given by ``n_neighbors``)
This setup the new black code formatting.
Also remove unused import and apply style with black
TODO: Think about the best way of supporting
``only_physical_cores``.
Move neighbors.NeighborsHeap's code and  _typedefs under
sklearn.utils as cyclic imports are currently happening
between sklearn.neighbors and sklearn.metrics.

Also, using integral in some cases gave unexpected results.
Occurences were changed to use np.int_p, as exposed by
utils._typedefs.ITYPE_t (we don't need signed integer)
So that the range correspond to actual datasets and not
to datasets whose marginal spreads are in [0, 1].
This test segfaults:

test_neighbors.py::test_fast_sqeuclidean_correctness[1-10-5-1000]
The segfaults was due to reallocation
of on the same pointers, causing multiple
freeing on the same reference and memory leaks.

To resolve this, arrays of pointers for local
datastructures are allocated at the initialisation
of the interface so that they can be handled
separately in threads with proper allocation
and deallocation.

The memory management will be wrapped in
subsequent private template method for
each types of reduction and parallelisation
strategy. This is one of the next iteration.
Refactor ArgKmin._reduce_on_chunks to pave the
way to general interface for reductions.

Private datastructures will have to be accessed via
the implementation of this private method.
Introduce ParallelReduction as a abstract class,
and extend it using ArgKmin.

FastSquaredEuclideanArgKmin extends ArgKmin
for the "fast_sqeuclidean" strategy.
The associated _typedefs.pyx file has been
moved to utils to avoid circular dependencies
has it is being used in neighbors.
jjerphan and others added 20 commits September 23, 2021 17:31
np.einsum('ij,ij->i') is handy but is single-threaded.

This new interface makes uses of OpenMP and BLAS dot
for parallelized and vectorized computations.
This edge case was observed on Windows 32bit:

    > np.testing.assert_allclose(distances, fsq_distances, rtol=1e-5)
    E AssertionError:
    E Not equal to tolerance rtol=1e-05, atol=0
    E
    E Mismatched elements: 1 / 10000 (0.01%)
    E Max absolute difference: 0.00020249
    E Max relative difference: 1.05109993e-05
    E x: array([40.123604, 30.522007, 49.364288, ...,
    41.741158, 41.340405, 36.132567])
    E y: array([40.123588, 30.522021, 49.364299, ...,
    41.741134, 41.340408, 36.132622])
Co-authored-by: Thomas J. Fan <[email protected]>
Co-authored-by: Jérémie du Boisberranger <[email protected]>
Co-authored-by: Jérémie du Boisberranger <[email protected]>
This should increases code coverage.
@github-actions github-actions bot added the cython label Oct 1, 2021
@jjerphan
Copy link
Member Author

jjerphan commented Feb 9, 2022

Closing due to diversion, this might be reopened in the future.

@jjerphan jjerphan closed this Feb 9, 2022
@jjerphan jjerphan deleted the pairwise-kernels branch October 21, 2022 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant