An innovative approach using convolutional autoencoders to define robust and meaningful distances between temporal signals in a structured latent space.
Measuring distances between signals accurately is critical in many fields such as biomedical engineering, signal processing, and financial analysis. Traditional methods like Euclidean distance or Dynamic Time Warping (DTW) often fail to handle noise, phase shifts, or computational complexity adequately.
This project addresses these issues by developing a convolutional autoencoder (CAE) that learns a latent representation of signals specifically structured to preserve meaningful distances defined in the original parameter space.
A detailed report including theoretical background and extended results is available here for interested readers.
Conventional distance metrics have significant limitations:
- Euclidean distance is intuitive but sensitive to noise and phase shifts.
- Fourier-based methods are computationally efficient but fail under time shifts.
- DTW handles distortions but is computationally costly (O(n²)).
By training an autoencoder to preserve a defined metric in its latent representation, we obtain a robust alternative that overcomes these limitations, capable of efficient clustering, classification, and anomaly detection.
The convolutional autoencoder comprises:
- Encoder: Compresses signals into a lower-dimensional latent representation.
- Decoder: Reconstructs signals from latent vectors.
The latent vector
Two primary distance metrics are considered in the latent space:
- Euclidean distance:
- Cosine distance:
The model’s loss combines signal reconstruction accuracy and the preservation of distances defined by original signal parameters:
-
$s_i, \hat{s}_i$ : original and reconstructed signals -
$z_i$ : latent representation -
$\theta_i$ : original signal parameters -
$\alpha, \beta$ : weighting hyperparameters
Initially, the model was trained on synthetic signals composed of polynomial and cosine terms:
This controlled setting demonstrated the CAE’s capacity to preserve complex parametric relationships in latent space.
To evaluate the generalization of our approach beyond synthetic signals, we applied the model to real ECG data from the WaveForm Database. These signals are more complex and noisy, with varying frequencies and amplitudes, making them an excellent testbed for the autoencoder's robustness.
The model was trained on 4-second ECG segments and learned a latent representation that simultaneously preserved inter-signal distances and enabled accurate reconstruction.
Reconstruction performance:
The CAE achieves high-fidelity reconstructions, with Mean Squared Errors below 0.0015. Despite the variability and noise in ECG waveforms, the model captures key features such as QRS complexes and denoises irrelevant fluctuations. This makes it well-suited for applications like anomaly detection or compression.
Distance preservation:
To evaluate whether the latent space meaningfully captures proximity, we computed distances between several pairs of signals and visualized the closest, farthest, and a random pair:
The plots illustrate that:
- Closest pair: morphologically near-identical signals yield low latent-space distances (~0.21).
- Farthest pair: highly dissimilar or noisy signals are mapped far apart (distance > 4).
- Random pair: moderate distance reflects mid-range similarity.
This confirms that the learned latent space reflects meaningful signal similarity as intended.
Analysis using PCA on latent representations showed clearly structured parameter trajectories, confirming that the latent space preserves the intrinsic geometry of signals.
Key findings:
- Clear separation of distinct signal classes.
- Robustness of distance preservation even under noisy conditions.
- Effective encoding of nonlinear signal characteristics.
Install dependencies:
pip install -r requirement.txt
If you use this project, please cite:
- Marius Dragic, "Signal Distance Measurement using Auto-encoder Latent Space", CentraleSupélec, 2025.
Other relevant works:
- Generalize and validate the model on diverse temporal signals, including financial data.
- Investigate more complex autoencoder architectures (e.g., U-Net-like structures) for richer latent representations.
- Conduct comprehensive comparative analyses against traditional metrics.
- Marius Dragic – [email protected]
MIT License. See LICENSE
file for details.
This README rigorously documents the theoretical foundations, implementation details, and experimental outcomes of using autoencoders to measure distances between signals.