Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@athurdekoos
Copy link
Contributor

@athurdekoos athurdekoos commented Nov 14, 2025

This PR is a continuation of #29528.

Description

This change ports einsum_sumprod from the generated einsum_sumprod.c.src file to a C++ template source file (einsum_sumprod.cpp). The goal is to improve readability and maintainability while preserving the existing behavior and performance characteristics.

There are no intended changes to the public API.

Summary of changes:

  • Replacement of einsum_sumprod.c.src with einsum_sumprod.cpp.
  • Original function structure was preserved as closely as possible.
    • Function patterns remain the same.
    • Function names have been preserved where possible.
    • Variable names have been preserved where possible.
  • Logic was maintained as close to the original as possible.
  • Some larger functions have been refactored into smaller discrete implementations.
  • Helper functions added.
  • If possible, macro based optimizations have been preserved.
  • Introduced new macro implementations primarily to handle NPY_SIMD_F32 and NPY_SIMD_F64.
  • Optimizations that were only present in some code paths in the original implementation have been ported to the corresponding functions in the new structure when possible.
  • Expanded and clarified some comments.
  • Some templated parameters from the original code were removed where they were not implement or did not materially improve performance or clarity.

Notes:

  • Prioritization of this port was readability over aggressive abstraction.
  • When possible templating is resolved at compile time.

Considerations not implemented:

  • Considered introducing C++ namespaces but did not do so to keep in line with the overall C-API style.
    • Some function names as a result are a long but descriptive.
  • Considered using templates as a replacement for some macros, but opted against this where it hurt clarity

Benchmarks

  • Across multiple machines were no major performance changes were observed.
  • However, on my personal machine, benchmarks seem to improve dramatically in several cases with the exception of a small inconsistent regression
Change Before [fabf184] After [fad2105] <einsum_sumprod_to_cpp> Ratio Benchmark (Parameter)
+ 194±7μs 220±10μs 1.13 bench_linalg.Eindot.time_inner_trans_a_a
- 25.5±1μs 23.5±0.3μs 0.92 bench_linalg.Eindot.time_dot_d_dot_b_c
- 1.10±0.1ms 987±30μs 0.89 bench_linalg.Eindot.time_dot_trans_at_a
- 106±10μs 70.8±7μs 0.66 bench_linalg.Eindot.time_dot_trans_a_atc
- 1.44±0.02ms 899±30μs 0.62 bench_linalg.Eindot.time_einsum_i_ij_j
- 110±5ms 33.1±3ms 0.30 bench_linalg.Eindot.time_einsum_ijk_jil_kl
- 114±6ms 6.11±0.08ms 0.05 bench_linalg.Eindot.time_einsum_ij_jk_a_b

Please let me know if you'd like me to adjust naming or structure.

@athurdekoos athurdekoos force-pushed the einsum_sumprod_to_cpp branch from 523e2df to bec2370 Compare November 15, 2025 00:04
@athurdekoos
Copy link
Contributor Author

@inakleinbottle quick fyi ping to keep you apprised of my current status

* and SIMDF64 */
template <>
struct SumSIMD<npy_float, npy_float> {
using SimdType = NpySIMDF32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this, and a great deal of the other supporting structure will be useful elsewhere and should probably be moved to a header file that can just be included in all these replacement files as they are added. It might be worth having a discussion with the optimization team too for the SIMD stuff to avoid replicating work.

/* Template where (npy_double, npy_double) will allow the SIMD
* capable version.*/
template <typename T, typename Temptype,
typename std::enable_if<std::is_same<T, Temptype>::value &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enable_if construct is a little odd here. First, since we have access to C++17, you might want to use the _t and _v variants which make this a bit more readable. However, here I think simple overloading would serve the same purpose. C++ always prefers concrete instantiations over templates if it is the best match, and enable_if constructions have a large cost for instantiation whereas overloading has a much smaller cost. I think though, that what is needed here is actually a partially specializable struct template that can be used as a customization point inside the sum_of_arr function. For instance:

template <typename T, typename TempType, typename SFINAE=void>
struct SumOfArr {
    static TempType eval(T* daa, npy_intp size) noexcept(?) {}
};
template <typename T>
static inline TempTypeOf<T> sum_of_arr(T* data, npy_intp count) noexcept (?) {
    using Helper = SumOfArray<T, TempTypeOf<T>>;
    return Helper::eval(data, count);
}

This is just an idea, but it would give you relatively low-overhead control the implementation based on the type and TempType (which I've shorthanded here as a trait, but could be left as a template argument if multiple choices are necessary).

@inakleinbottle
Copy link
Contributor

Thanks @athurdekoos for tagging me in. I had a quick look over the code and made a couple of what I hope are helpful comments, or at least things to think about. To be clear, I don't think these are necessary changes, so this isn't a review of any kind. Both comments are "looking ahead" to the other c.src modules too, so establishing some standard patterns and reusable components might be a good idea. Very happy discuss or explain further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants