Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Introduce SIMD intrinsics for _dist_metrics.pyx #26010

Open
@Micky774

Description

@Micky774

Context

Pairwise distance computation is an essential part of many estimators in scikit-learn, and can take up a significant portion of run time in certain workflows. I believe that we may achieve significant performance gains in several (perhaps most) distance metric implementations by leveraging SIMD intrinsics.

Proof of Concept

I built a quick proof of concept just to see what kinds of performance gains we could observe with a potentially-naive implementation of SIMD intrinsics. I chose to optimize the ManhattanDistance.dist function. This implementation uses intrinsics found in SSE{1,2,3}. To ensure that the instructions are supported, it checks for the presence of the SSE3 instruction set (SSE3 implies SSE{1,2}) and provides the optimized implementation if so. Otherwise it provides a dummy implementation just to appease Cython, and the main function falls back to the current implementation on main. Note that on most modern hardware, support for SSE3 is a reasonable expectation (indeed numpy assumes it is always present when optimization is enabled). For the specific implementation referred to here, please take a look at this PR: Micky774#11

Note that the full benefit of the intrinsics are gained when compiling with -march="native", however the benefit is still significant when compiling with -march="nocona", as is often default (e.g when following the scikit-learn development instructions on linux).

Benchmarks

The following benchmarks were produced by this gist: https://gist.github.com/Micky774/567a5fa199c05d90c4c08625b077840e

Summary: The SIMD implementations are ~2x faster than the current implementation for float32 and 1.5x faster for float64.

Plots

f2b1f1e8-59b0-4ec5-b91c-fe1d19abd9ec

Discussion

I haven't looked too deeply into this yet, as first I wanted to see whether there was interest in the venture. I would love to hear what the other maintainers' thoughts are regarding exploring this route in a bit more detail. Obviously SIMD implementations will bring with them added complexity, but the performance gains are pretty compelling. In my opinion, the tradeoff is worth it.

CC: @scikit-learn/core-devs

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions