In numerical computing and data science, we often treat sequences of indices and coordinates as materialized arrays. While Arithmetic Progressions (AP) and Quadratic Progressions (f(i) = A * i^2 + B * i + C) are mathematically fundamental, they are rarely leveraged as first-class, high-performance generators. Instead, most systems default to allocating memory for index buffers, leading to unnecessary memory pressure and cache misses.We are developing NumIter, a lazy-evaluation library that leverages Google Highway to turn these progressions into Indexable SIMD Iterators, eliminating the "memory tax" while maximizing instruction-level parallelism.Why This Matters in Data ScienceBy treating progressions as lazy generators rather than stored data, we can solve several performance bottlenecks:Zero-Copy Coordinate Grids: Generate grids for image processing (parabolic interpolation or warping) and spatial indexing without allocating a single byte of RAM.Non-Linear Feature Engineering: Efficiently create weighted decay functions or time-series windows where the relationship between the index and value is quadratic.Mathematical Memory Mapping: Calculate complex offsets for high-dimensional tensor slicing and dicing without hitting the memory wall.NumIter keeps the calculation in the L1 cache or CPU registers, providing a significant speedup over standard "eager" array libraries.The Architecture: Indexable SIMD IteratorsThe core of NumIter is built on iterators that provide a "dual-mode" access pattern. This allows the library to behave like a standard C++ container while performing vectorized math under the hood.1. Sequential Access (Loop Optimization)To maximize throughput during iteration, we implement Forward Differencing. By maintaining the state of the first and second differences, we eliminate all multiplications from the inner loop.Given a vector width W and a quadratic f(i), we initialize the following state vectors:curr_value: The current SIMD block [f(i), f(i+1), ..., f(i+W-1)].delta (Block Velocity): The first difference required to jump to the next block:f(i+W) - f(i) = A * (2 * i * W + W^2) + B * W.delta2 (Constant Acceleration): The constant second difference:2 * A * W^2.Inside the iterator's next() call, the update logic is reduced to two Highway additions:curr_value = Add(curr_value, delta)delta = Add(delta, delta2)2. Random Access (Indexability)Unlike traditional functional streams, our iterators are random-access indexable, fulfilling the std::random_access_iterator requirements.Nth Term Logic: For single-element access (operator[]) or arbitrary jumps (seek(index)), the iterator uses the direct quadratic formula (A * n^2 + B * n + C) in O(1) time.Hybrid Strategy: We use the Nth term formula to "prime" the state for arbitrary offsets and Forward Differencing to "crank" the loop, ensuring optimal performance for both search and bulk evaluation.Key FeaturesAP to Quadratic Elevation: Multiplying two Arithmetic Progressions (A=0) automatically elevates the iterator state to a Quadratic generator at compile-time.Lazy Evaluation: Expression trees are built using template metaprogramming; math is only executed when the iterator is advanced.O(1) Reductions: Summation of sequences is handled via closed-form formulas rather than iterative accumulation.Formula: Sum = A * [ (n-1) * n * (2n-1) / 6 ] + B * [ (n-1) * n / 2 ] + C * nRequest for FeedbackWe are currently using Highway's ScalableTag and Iota for initialization. We would love feedback from the Highway community on:Best practices for handling "tails" (masking via LoadN vs. padding) within a custom iterator's SIMD load method.Thoughts on the "Indexable Iterator" pattern as a standard way to wrap Highway-based generators for C++ STL and Python (nanobind) compatibility.Repository: https://www.google.com/search?q=https://github.com/SIE-Libraries/NumIter
In numerical computing and data science, we often treat sequences of indices and coordinates as materialized arrays. While Arithmetic Progressions (AP) and Quadratic Progressions (f(i) = A * i^2 + B * i + C) are mathematically fundamental, they are rarely leveraged as first-class, high-performance generators. Instead, most systems default to allocating memory for index buffers, leading to unnecessary memory pressure and cache misses.We are developing NumIter, a lazy-evaluation library that leverages Google Highway to turn these progressions into Indexable SIMD Iterators, eliminating the "memory tax" while maximizing instruction-level parallelism.Why This Matters in Data ScienceBy treating progressions as lazy generators rather than stored data, we can solve several performance bottlenecks:Zero-Copy Coordinate Grids: Generate grids for image processing (parabolic interpolation or warping) and spatial indexing without allocating a single byte of RAM.Non-Linear Feature Engineering: Efficiently create weighted decay functions or time-series windows where the relationship between the index and value is quadratic.Mathematical Memory Mapping: Calculate complex offsets for high-dimensional tensor slicing and dicing without hitting the memory wall.NumIter keeps the calculation in the L1 cache or CPU registers, providing a significant speedup over standard "eager" array libraries.The Architecture: Indexable SIMD IteratorsThe core of NumIter is built on iterators that provide a "dual-mode" access pattern. This allows the library to behave like a standard C++ container while performing vectorized math under the hood.1. Sequential Access (Loop Optimization)To maximize throughput during iteration, we implement Forward Differencing. By maintaining the state of the first and second differences, we eliminate all multiplications from the inner loop.Given a vector width W and a quadratic f(i), we initialize the following state vectors:curr_value: The current SIMD block [f(i), f(i+1), ..., f(i+W-1)].delta (Block Velocity): The first difference required to jump to the next block:f(i+W) - f(i) = A * (2 * i * W + W^2) + B * W.delta2 (Constant Acceleration): The constant second difference:2 * A * W^2.Inside the iterator's next() call, the update logic is reduced to two Highway additions:curr_value = Add(curr_value, delta)delta = Add(delta, delta2)2. Random Access (Indexability)Unlike traditional functional streams, our iterators are random-access indexable, fulfilling the std::random_access_iterator requirements.Nth Term Logic: For single-element access (operator[]) or arbitrary jumps (seek(index)), the iterator uses the direct quadratic formula (A * n^2 + B * n + C) in O(1) time.Hybrid Strategy: We use the Nth term formula to "prime" the state for arbitrary offsets and Forward Differencing to "crank" the loop, ensuring optimal performance for both search and bulk evaluation.Key FeaturesAP to Quadratic Elevation: Multiplying two Arithmetic Progressions (A=0) automatically elevates the iterator state to a Quadratic generator at compile-time.Lazy Evaluation: Expression trees are built using template metaprogramming; math is only executed when the iterator is advanced.O(1) Reductions: Summation of sequences is handled via closed-form formulas rather than iterative accumulation.Formula: Sum = A * [ (n-1) * n * (2n-1) / 6 ] + B * [ (n-1) * n / 2 ] + C * nRequest for FeedbackWe are currently using Highway's ScalableTag and Iota for initialization. We would love feedback from the Highway community on:Best practices for handling "tails" (masking via LoadN vs. padding) within a custom iterator's SIMD load method.Thoughts on the "Indexable Iterator" pattern as a standard way to wrap Highway-based generators for C++ STL and Python (nanobind) compatibility.Repository: https://www.google.com/search?q=https://github.com/SIE-Libraries/NumIter