Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Tags: tphakala/simd

Tags

v1.0.22

Toggle v1.0.22's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge pull request #14 from tphakala/feature/reverse-addsub

Add Reverse and AddSub SIMD operations for f32

v1.0.21

Toggle v1.0.21's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge pull request #12 from tphakala/feature/butterfly-complex

Add ButterflyComplex for fused FFT butterfly with twiddle multiply

v1.0.20

Toggle v1.0.20's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge pull request #11 from tphakala/feature/split-complex-ops

Add c64 package and split-format complex operations

v1.0.19

Toggle v1.0.19's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge pull request #10 from tphakala/feature/f16-half-precision

Add f16 package for half-precision (FP16) SIMD operations

v1.0.18

Toggle v1.0.18's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge pull request #8 from tphakala/feature/int-to-float-scale

Add Int32ToFloat32Scale for audio PCM conversion

v1.0.17

Toggle v1.0.17's commit message
Fix ARM64 NEON sigmoid instruction encodings

Corrected multiple wrong instruction encodings in sigmoidNEON:
- FNEG: 0x6EA0F800 → 0x6EA07C00
- FMIN: 0x4E3BF400 → 0x4EBBF400
- FMAX: 0x4E38F400 → 0x4E3CF400
- FRINTN: 0x4EA19822 → 0x4E218C22
- FCVTZS: 0x4EA1A841 → 0x4EA1B841
- SHL: 0x4F575C21 → 0x4F375421
- ADD: 0x4EA18421 → 0x4EB68421

The original encodings caused SIGILL on ARM64 due to invalid
instruction bytes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

v1.0.16

Toggle v1.0.16's commit message
Fix sigmoid to use accurate exp-based computation

Replace fast but inaccurate soft-sign approximation with proper sigmoid
using range reduction and polynomial exp approximation.

Old (wrong): σ(x) ≈ 0.5 + 0.5*x/(1+|x|)  - up to 7.6% error
New (correct): σ(x) = 1/(1+exp(-x))       - float32 precision

Algorithm:
- Clamp input to [-20, 20] to prevent overflow
- Range reduction: exp(-x) = 2^k * exp(r)
- 5-term Taylor polynomial for exp(r)
- Reconstruct via IEEE754 exponent manipulation

Performance: ~17x faster than pure Go (vs ~40x with wrong approximation)
Throughput: 23.8 GB/s on AVX, matching pure Go accuracy.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

v1.0.15

Toggle v1.0.15's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge pull request #5 from tphakala/feature/activation-functions

Add neural network activation functions with SIMD optimizations

v1.0.14

Toggle v1.0.14's commit message
Add govet linter to .golangci.yaml and update assembly code for compl…

…ex128 operations

- Added the govet linter to the GolangCI configuration for improved code analysis.
- Updated assembly code in c128_amd64.s to use more descriptive variable names (s_real and s_imag) for clarity.
- Adjusted frame sizes in various functions to ensure proper alignment and memory usage.

v1.0.13

Toggle v1.0.13's commit message
Update documentation in doc.go to reflect changes in available operat…

…ions

- Added new arithmetic operations: AddScaled, FMA
- Updated reductions to include DotProductBatch, MinIdx, MaxIdx
- Introduced new statistics functions: StdDev, EuclideanDistance, Normalize
- Enhanced element-wise operations with Reciprocal
- Added AccumulateAdd and CumulativeSum to Audio DSP section
- Improved clarity and organization of the documentation