-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Describe the bug
When using the huggingface datasets.Audio feature to decode a local or remote (public HF dataset) audio file inside Google Colab, the notebook kernel crashes with std::bad_alloc (C++ memory allocation failure).
The crash happens even with a minimal code example and valid .wav file that can be read successfully using soundfile.
Here is a sample Collab notebook to reproduce the problem.
https://colab.research.google.com/drive/1nnb-GC5748Tux3xcYRussCGp2x-zM9Id?usp=sharing
code sample:
...
audio_dataset = audio_dataset.cast_column("audio", Audio(sampling_rate=16000))
# Accessing the first element crashes the Colab kernel
print(audio_dataset[0]["audio"])
Error log
WARNING what(): std::bad_alloc
terminate called after throwing an instance of 'std::bad_alloc'
Environment
Platform: Google Colab (Python 3.12.12)
datasets Version: 4.3.0
soundfile Version: 0.13.1
torchaudio Version: 2.8.0+cu126
Thanks in advance to help me on this error I get approx two weeks now after it was working before.
Regards
Steps to reproduce the bug
https://colab.research.google.com/drive/1nnb-GC5748Tux3xcYRussCGp2x-zM9Id?usp=sharing
Expected behavior
Loading the audio and decode it.
It should safely return:
{
"path": "path/filaname.wav",
"array": np.ndarray([...]),
"sampling_rate": 16000
}
Environment info
Environment
Platform: Google Colab (Python 3.12.12)
datasets Version: 4.3.0
soundfile Version: 0.13.1
torchaudio Version: 2.8.0+cu126