Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Audio.cast_column() or Audio.decode_example() causes Colab kernel crash (std::bad_alloc) #7834

@rachidio

Description

@rachidio

Describe the bug

When using the huggingface datasets.Audio feature to decode a local or remote (public HF dataset) audio file inside Google Colab, the notebook kernel crashes with std::bad_alloc (C++ memory allocation failure).
The crash happens even with a minimal code example and valid .wav file that can be read successfully using soundfile.

Here is a sample Collab notebook to reproduce the problem.
https://colab.research.google.com/drive/1nnb-GC5748Tux3xcYRussCGp2x-zM9Id?usp=sharing

code sample:

...
audio_dataset = audio_dataset.cast_column("audio", Audio(sampling_rate=16000))

# Accessing the first element crashes the Colab kernel
print(audio_dataset[0]["audio"])

Error log

WARNING what(): std::bad_alloc
terminate called after throwing an instance of 'std::bad_alloc'

Environment

Platform: Google Colab (Python 3.12.12)
datasets Version: 4.3.0
soundfile Version: 0.13.1
torchaudio Version: 2.8.0+cu126

Thanks in advance to help me on this error I get approx two weeks now after it was working before.

Regards

Steps to reproduce the bug

https://colab.research.google.com/drive/1nnb-GC5748Tux3xcYRussCGp2x-zM9Id?usp=sharing

Expected behavior

Loading the audio and decode it.
It should safely return:

{
"path": "path/filaname.wav",
"array": np.ndarray([...]),
"sampling_rate": 16000
}

Environment info

Environment

Platform: Google Colab (Python 3.12.12)
datasets Version: 4.3.0
soundfile Version: 0.13.1
torchaudio Version: 2.8.0+cu126

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions