
A C++ DSP library with MATLAB-like syntax for readable and safe signal processing.
This is a library for those who hate this kind of code:
std::transform(x.begin(), x.end(), x.begin(), [](auto& v){
return v * 0.3;
});
auto power = std::accumulate(x.begin() + lb, x.begin() + rb, 0.0, [](double accum, const std::complex<double>& v) {
return accum + (v.real() * v.real() + v.imag() * v.imag());
});
auto r = std::vector<double>(x1.size());
for (int i=0; i < r.size(); ++i) {
r[i] = x1[i] * x2[i];
}
auto p = fftw_plan_dft_1d(N, x.data(), spec.data(), FFTW_FORWARD, FFTW_ESTIMATE);
fftw_execute(p);
fftw_destroy_plan(p);
and who likes this:
using namespace dsplib;
x *= 0.3;
auto power = sum(abs2(x.slice(lb, rb)));
auto r = x1 * x2;
auto spec = fft(x);
Motivation
dsplib is, essentially, my personal collection of DSP screwdrivers.
It has been slowly assembled over several years and used across many projects. The reason is simple: a lot of C++ DSP code is painful to read, easy to misuse, and surprisingly good at reproducing the same bugs in slightly different forms.
Raw pointers in public APIs, implicit assumptions about buffer sizes, and hand-written loops tend to age poorly.
dsplib is an attempt to build a 1D DSP library with a human face: modern C++ abstractions, explicit intent, and a syntax that does not fight the person writing the code.
The library favors clarity and correctness first — and then works very hard to make it fast.
Usage
Quick Taste
auto x = randn(1024);
x *= window::hann(x.size());
auto spec = pow2db(abs2(fft(x)));
Conceptual Differences
The table below highlights key conceptual differences between dsplib and MATLAB / NumPy, beyond simple syntax similarities.
| dsplib | matlab | numpy |
| x * x | x .* x | x * x |
| zeros(20) | zeros(20, 1) | zeros(20) |
| x.slice(0,10) = 1 | x(1:10) = 1 | x[0:10] = 1 |
| x.slice(2,end) = 1 | x(3:end) = 1 | x[2:] = 1 |
| x.slice(2, 4) = {1, 2} | x(3:4) = [1, 2] | x[2:4] = [1, 2] |
| x.slice(0, -1) | x(1:end-1) | x[0:-1] |
Only 1-D arrays with element-wise operations are currently supported.
Scalar operations:
dsplib::real_t v1;
v1 = 10;
v2 = v1;
v2 = 10-10i;
v2 = {10, -10};
v2.re = 10;
v2.im = -10;
v2 = std::complex<double>(10, -10);
Vector operations:
using namespace dsplib;
x2.slice(2, 4) = {1i, 2i};
Slicing
The behavior of slices is as close as possible to numpy. Except for cases with invalid indexes, in which case numpy does not throw an exception.
x.slice(0, 2)
x.slice(2, -1)
x.slice(-1, 0, -1)
x.slice(-1, 0)
x.slice(0, -1, -1)
x.slice(-8, 7)
Fast Fourier Transform:
The FFT implementation has no radix size limitations. It supports power-of-two, prime, and semiprime radices.
The tables for the FFT are stored in the LRU cache and can be recalculated (if the pipeline uses many different bases). Use the FftPlan object to avoid this.
If your platform has a faster implementation, you can set the DSPLIB_EXCLUDE_FFT=ON option and implement the get_fft_plan functions (see the lib/fft/fftw.cpp example). You can also select the type of FFT backend via the DSPLIB_FFT_BACKEND option (dsplib, fftw, ne10[float]).
const int n = 512;
std::shared_ptr<FftPlanR> plan = fft_plan_r(n);
arr_cmplx y3 = plan->solve(make_span(x.data(), n));
plan->solve(x, r);
plan->solve(make_span(x.data(), n), make_span(r.data(), n))
const int n = 512;
arr_real y2 = irfft(x.slice(0, n/2+1), n);
const int n = 512;
const auto x = complex(ones(n));
auto plan = ifft_plan_r(n);
plan->solve(make_span(x.data(), n/2+1), make_span(y2.data(), n));
FIR filter:
const auto h = fir1(100, 0.1, FilterType::Low);
FFT-based FIR filtering using overlap-add method.
Definition fir.h:75
Hilbert filter:
Hilbert filter (FIR filter based)
Definition hilbert.h:20
Add White Gaussian Noise:
Cross-correlation:
Delay estimation:
auto d1 = finddelay(x1, x2);
auto [d2, _] = gccphat(x2, x1);
Spectrum Analyze:
const int n = 1024;
x *= window::hann(n);
y = y.slice(0, n/2+1);
auto z = pow2db(abs2(y));
FIR filter design:
auto x = randn(10000);
auto h = fir1(99, 0.1, 0.2, FilterType::Bandstop);
auto y = flt(x);
FIR filter class.
Definition fir.h:17
Adaptive filters:
int M = 50;
auto rir = fir1(M-1, 0.1);
int L = 100000;
auto x = randn(L);
auto n = 0.01 * randn(L);
auto d = flt(x) + n;
auto [y, e] = rls(x, d);
ASSERT_LE(nmse(flt.coeffs(), rir), 0.01);
Resampling:
To process signal in batches (for example, in real time), use modules FIRDecimator, FIRInterpolator, FIRRateConverter or FIRResampler. The output signal will be delayed, but there will be no gap between frames.
To process the entire signal (for example, from a file), use function resample(x, p, q). The processing result will be aligned.
auto out = rsmp.process(in);
auto out = rsmp.process(in);
auto out = dsplib::resample(in, 16000, 44100);
auto out = dsplib::resample(in, 160, 441);
Definition resample.h:103
Definition resample.h:133
auto out = decim.process(in);
auto out = dsplib::resample(in, 1, 2);
auto out = dsplib::resample(in, 16000, 32000);
auto out = interp.process(in);
auto out = dsplib::resample(in, 2, 1);
auto out = dsplib::resample(in, 32000, 16000);
⚠️ Thread Safety & Memory Notice
The standard implementation is thread-safe because all caches (primarily FFT-related) use thread_local storage.
Memory warning: This may increase memory consumption if used carelessly – please avoid spreading processing across hundreds of threads.
The FFTW3 backend is wrapped with a static mutex (excluding fftw_execute calls) and is also thread-safe.
Documentation
Full API documentation is available here:
📖 https://vitalsong.github.io/dsplib/
The documentation is generated using Doxygen and reflects the current public API.
Build
Requires:
- CMake (>=3.15)
- C++17 compiler (exceptions can be disabled)
dsplib is designed with portability in mind and avoids platform-specific dependencies wherever possible.
If your platform has a reasonably complete C++17 compiler and CMake support, there is a good chance dsplib will build there as well.
In practice, it is often easier to list platforms where dsplib does not build. So far, I haven't found many :)
Known to work in production on:
- Android (API 27, 29, ARMv7 / ARMv8)
- Linux (GCC, Clang, MinGW)
- Windows (MSVC, MinGW)
- macOS (Clang)
- WebAssembly (Emscripten)
- Custom Buildroot-based ARM toolchains
Build and install:
# set DSPLIB_USE_FLOAT32=ON to enable float base type (double by default)
# set DSPLIB_NO_EXCEPTIONS=ON to disable exceptions
# set BUILD_SHARED_LIBS=ON to build shared lib
cmake . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --target=install
Use CPM manager:
CPMAddPackage(NAME dsplib
GIT_REPOSITORY
"https://github.com/vitalsong/dsplib.git"
VERSION
1.0.0
OPTIONS
"DSPLIB_USE_FLOAT32 OFF"
"DSPLIB_NO_EXCEPTIONS OFF"
EXCLUDE_FROM_ALL ON
)
target_link_libraries(${PROJECT_NAME} dsplib)
Performance
To build and run benchmarks:
cmake . -B build -DCMAKE_BUILD_TYPE=Release -DDSPLIB_BUILD_BENCHS=ON
cmake --build build
./build/benchs/dsplib-benchs
FFT
The implementation of non-power-of-two FFT is based on the general factorization algorithm. It is usually slower, but not critical.
For prime and semi-prime numbers, the czt algorithm is used, which can be significantly slower (but not as slow as regular DFT).
Use FFT(N!=2^K) only if you know what you are doing.
Run on (20 X 5100 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x10)
L1 Instruction 32 KiB (x10)
L2 Unified 2048 KiB (x10)
L3 Unified 24576 KiB (x1)
-------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------------
BM_FFT_DSPLIB/1024/min_time:5.000 4.61 us 4.61 us 1503885
BM_FFT_DSPLIB/1331/min_time:5.000 37.5 us 37.5 us 185462
BM_FFT_DSPLIB/1536/min_time:5.000 12.7 us 12.7 us 533535
BM_FFT_DSPLIB/1984/min_time:5.000 60.3 us 60.3 us 116035
BM_FFT_DSPLIB/2048/min_time:5.000 10.4 us 10.4 us 672628
BM_FFT_DSPLIB/4096/min_time:5.000 23.0 us 23.0 us 303742
BM_FFT_DSPLIB/8192/min_time:5.000 53.2 us 53.2 us 131683
BM_FFT_DSPLIB/11200/min_time:5.000 266 us 266 us 26324
BM_FFT_DSPLIB/11202/min_time:5.000 511 us 511 us 13702
BM_FFT_DSPLIB/16384/min_time:5.000 113 us 113 us 62225
BM_FFTW3_DOUBLE/1024/min_time:5.000 1.03 us 1.03 us 6563943
BM_FFTW3_DOUBLE/1331/min_time:5.000 4.20 us 4.20 us 1673972
BM_FFTW3_DOUBLE/1536/min_time:5.000 1.89 us 1.89 us 3666687
BM_FFTW3_DOUBLE/1984/min_time:5.000 12.3 us 12.3 us 553932
BM_FFTW3_DOUBLE/2048/min_time:5.000 2.43 us 2.43 us 2814851
BM_FFTW3_DOUBLE/4096/min_time:5.000 6.82 us 6.82 us 1027944
BM_FFTW3_DOUBLE/8192/min_time:5.000 14.7 us 14.7 us 479778
BM_FFTW3_DOUBLE/11200/min_time:5.000 22.2 us 22.2 us 310204
BM_FFTW3_DOUBLE/11202/min_time:5.000 135 us 135 us 51474
BM_FFTW3_DOUBLE/16384/min_time:5.000 30.1 us 30.1 us 231342
BM_KISSFFT/1024/min_time:5.000 4.25 us 4.25 us 1640712
BM_KISSFFT/1331/min_time:5.000 42.2 us 42.2 us 170796
BM_KISSFFT/1536/min_time:5.000 8.47 us 8.47 us 787339
BM_KISSFFT/1984/min_time:5.000 71.1 us 71.1 us 96450
BM_KISSFFT/2048/min_time:5.000 13.0 us 13.0 us 536075
BM_KISSFFT/4096/min_time:5.000 21.1 us 21.1 us 331449
BM_KISSFFT/8192/min_time:5.000 54.6 us 54.6 us 129238
BM_KISSFFT/11200/min_time:5.000 127 us 127 us 54032
BM_KISSFFT/11202/min_time:5.000 27936 us 27935 us 250
BM_KISSFFT/16384/min_time:5.000 98.5 us 98.5 us 69101
TODO (this is a wishlist, not a roadmap):
- Add matrix syntax support;
- Add custom allocator for
base_array<T> type;
- Add audioread/audiowrite functions (optional libsndfile?);
- Add chain syntax like
fft(x)->abs2()->pow2db();
SOS filters;
- Multichannel resampler;
- Thread-safe storage for
FFT (not thread_local);
- Add
chirp, conv, filter, dzt, remez etc.
- Real/Imag slice for
arr_cmplx;
License Notes
⚠️ Critical compliance notice:
If you enable FFTW3 backend via -DSPLIB_FFT_BACKEND=fftw, your project automatically falls under GPLv2+ requirements.