This Google Colab notebook provides a batch audio processing pipeline for bioacoustic analysis using BirdNET (Cornell Lab of Ornithology). The workflow merges multiple audio recordings end-to-end, runs BirdNET species detection with confidence thresholds, generates Audacity-compatible label files for visual inspection, and produces comprehensive visualizations including waveforms, spectrograms, f0 tracks, and formant-like resonance estimates (F1-F3) via Linear Predictive Coding (LPC).
Key Features:
- End-to-end audio merging with timeline index generation
- BirdNET automated bird call detection (7-class CNN)
- Audacity label export for manual verification
- Per-detection waveform and spectrogram visualization
- F0 estimation via librosa pyin (YIN algorithm)
- Formant estimation via LPC autocorrelation method
- Overview spectrogram with detection overlays
Application: Ornithology research, biodiversity monitoring, acoustic ecology, bioacoustic annotation workflows.
BirdNET (Kahl et al., 2021) is a deep learning model for automated bird species identification from audio recordings. Developed by the Cornell Lab of Ornithology and Chemnitz University of Technology, it uses a convolutional neural network (CNN) trained on millions of bird vocalizations to detect and classify over 3,000 species globally.
Architecture:
- Input: Mel spectrogram (time-frequency representation)
- CNN backbone: Residual blocks with batch normalization
- Output: Softmax classification with confidence scores (0-1)
The pipeline consists of four stages:
- Upload & Merge: Concatenate audio files end-to-end → export merged WAV + Audacity labels + index CSV
- BirdNET Analysis: Run detection on original files → align detections to merged timeline
- Visualize Detections: Generate waveforms + spectrograms with f0/formant overlays
- Overview Spectrogram: Full-file spectrogram with detection spans labeled
YIN (de Cheveigné & Kawahara, 2002) estimates fundamental frequency via autocorrelation:
Difference function: $$ d(\tau) = \sum_{n=0}^{N-\tau-1} (x[n] - x[n+\tau])^2 $$
Cumulative mean normalized difference: $$ d'(\tau) = \begin{cases} 1 & \tau = 0 \ \frac{d(\tau)}{\frac{1}{\tau} \sum_{j=1}^{\tau} d(j)} & \tau > 0 \end{cases} $$
Period estimation: First minimum of
Fundamental frequency: $$ f_0 = \frac{f_s}{\hat{\tau}}, \quad \hat{\tau} = \arg\min_{\tau} d'(\tau) $$
pyin (Mauch & Dixon, 2014) extends YIN with probabilistic voicing detection for more robust f0 tracking.
LPC models the vocal tract as an all-pole filter:
Autocorrelation method (Levinson-Durbin recursion):
Solve for coefficients
Formant frequencies from LPC roots:
where
Note: For bird vocalizations, "formants" represent spectral resonances rather than true vocal tract formants (as in speech).
Mel scale (perceptually-motivated frequency scale):
Mel filterbank: Triangular filters spaced linearly on Mel scale, log-linearly on Hz.
BirdNET preprocessing: STFT → Mel filterbank → log magnitude → CNN input.
Process:
- Upload audio files to Google Colab
- Sort by filename (alphabetical) or modification time
- Concatenate end-to-end with optional gap (default: 0 ms)
- Normalize loudness (per-file and final mix)
- Export merged WAV + Audacity labels + index CSV
Outputs:
merged_no_overlap.wav— Concatenated audiomerged_no_overlap_labels_audacity.txt— File boundaries as Audacity regionsmerged_no_overlap_index.csv— Timeline mapping (filename → start/end times)
Process:
- Load index CSV from Stage 1
- Run BirdNET on each original file
- Align detections to merged timeline using start offsets
- Filter by confidence threshold (default: 0.25)
- Export detections CSV + Audacity labels
Outputs:
merged_no_overlap_birdnet_detections.csv— All detections with global timestampsmerged_no_overlap_birdnet_labels.txt— Detection regions for Audacity
Configuration:
MIN_CONF: Minimum confidence (0-1)LAT,LON,DATE: Optional geographic/temporal context for species filteringSPECIES_FILTER: Optional list of expected species (reduces false positives)
Process (per detection):
- Extract audio segment (with padding: default ±0.2s)
- Compute f0 track via librosa pyin
- Estimate formants (F1-F3) via LPC (order = max(8, 2 + sr/1000))
- Calculate spectral features (centroid, peak frequency)
- Generate waveform plot (time domain)
- Generate spectrogram with f0 overlay (frequency domain)
Outputs:
det_XXXX_waveform.png— Time-domain amplitude plotdet_XXXX_spectrogram.png— STFT spectrogram with f0 track overlaybirdnet_detection_summary.csv— Metrics table (f0, F1-F3, spectral centroid, etc.)
Process:
- Compute full-file STFT spectrogram
- Overlay BirdNET detections as shaded time spans
- Label each detection with species name + confidence
Output:
overview_spectrogram.png— Full timeline spectrogram with annotations
bird-net-batch-analysis/
├── AudacityTaggerBatchCompiler.ipynb # Google Colab notebook (all stages)
└── README.md
- Open notebook: Click "Open in Colab" badge in
AudacityTaggerBatchCompiler.ipynb - Run setup cells: Installs dependencies (BirdNET, librosa, pydub, graphviz)
- Upload audio files: Use file upload widget in Cell 1
- Configure parameters:
- Stage 1:
NORMALIZE_EACH,GAP_BETWEEN_MS,FORCE_SAMPLE_RATE - Stage 2:
MIN_CONF,LAT,LON,DATE - Stage 3:
F0_MIN_HZ,F0_MAX_HZ,LPC_ORDER,PAD_BEFORE_S,PAD_AFTER_S
- Stage 1:
- Run cells sequentially: Execute Shift+Enter for each cell
- Download outputs: All files saved to
/content/(accessible via Colab file browser)
| Parameter | Default | Description |
|---|---|---|
MIN_CONF |
0.25 | Minimum BirdNET confidence (0-1) |
F0_MIN_HZ |
100.0 | Lower bound for f0 search (Hz) |
F0_MAX_HZ |
8000.0 | Upper bound for f0 search (Hz) |
LPC_ORDER |
auto | LPC order for formant estimation (typically 8-16) |
PAD_BEFORE_S |
0.2 | Context before detection (seconds) |
PAD_AFTER_S |
0.2 | Context after detection (seconds) |
GAP_BETWEEN_MS |
0 | Silence between merged files (ms) |
Import merged audio:
- File → Open → Select
merged_no_overlap.wav
Import labels:
- File → Import → Labels → Select
merged_no_overlap_labels_audacity.txt(file boundaries) - File → Import → Labels → Select
merged_no_overlap_birdnet_labels.txt(detections)
Switch to spectrogram view:
- Click track dropdown → Spectrogram
- Navigate detections using label regions
- Verify BirdNET classifications visually
| File | Description |
|---|---|
merged_no_overlap.wav |
Concatenated audio (all files end-to-end) |
merged_no_overlap_labels_audacity.txt |
File boundary labels for Audacity |
merged_no_overlap_index.csv |
Timeline index (filename → start/end) |
merged_no_overlap_birdnet_detections.csv |
BirdNET detections with global timestamps |
merged_no_overlap_birdnet_labels.txt |
Detection labels for Audacity |
birdnet_detection_summary.csv |
Per-detection metrics (f0, F1-F3, etc.) |
det_XXXX_waveform.png |
Waveform plot for detection XXXX |
det_XXXX_spectrogram.png |
Spectrogram plot for detection XXXX |
overview_spectrogram.png |
Full-file spectrogram with overlays |
- Kahl, S., Wood, C. M., Eibl, M., & Klinck, H. (2021). "BirdNET: A deep learning solution for avian diversity monitoring". Ecological Informatics, 61, 101236. DOI: 10.1016/j.ecoinf.2021.101236
- BirdNET GitHub: https://github.com/kahst/BirdNET-Analyzer
- de Cheveigné, A., & Kawahara, H. (2002). "YIN, a fundamental frequency estimator for speech and music". Journal of the Acoustical Society of America, 111(4), 1917-1930. DOI: 10.1121/1.1458024
- Mauch, M., & Dixon, S. (2014). "pYIN: A fundamental frequency estimator using probabilistic threshold distributions". Proc. ICASSP, 659-663.
- Makhoul, J. (1975). "Linear prediction: A tutorial review". Proceedings of the IEEE, 63(4), 561-580.
- Rabiner, L. R., & Schafer, R. W. (2011). Theory and Applications of Digital Speech Processing. Prentice Hall.
- McFee, B., et al. (2015). "librosa: Audio and Music Signal Analysis in Python". Proc. SciPy, 18-24. DOI: 10.25080/Majora-7b98e3ed-003
MIT License (2025)
Copyright (c) 2025 George Redpath
George Redpath (Ziforge) GitHub: @Ziforge Focus: Bioacoustics, ornithology, automated species detection
- Cornell Lab of Ornithology — BirdNET model development
- Google Colab — Cloud computing platform
- Audacity Team — Open-source audio editor
- librosa Contributors — Audio analysis library
@misc{redpath2025birdnet,
author = {Redpath, George},
title = {BirdNET Batch Analysis for Audacity Integration},
year = {2025},
publisher = {GitHub},
url = {https://github.com/Ziforge/bird-net-batch-analysis}
}Built for bioacoustic research and ornithological field studies.