Thanks to visit codestin.com
Credit goes to github.com

Skip to content

NeemaJ/BirdCLEF

Repository files navigation

Bird Audio-Call Identification 🐦

Objective

In this project, we created a convolutional neural network (CNN) to identify bird species in the Western Ghats, a biodiversity hotspot. Our approach included:

  • Implement preprocessing steps to handle to isolate and clean audio signal
  • Employ multiprocessing, generators, and disk-writing to improve computation efficiency
  • Utilize a CNN for identifying the species of origin for calls.

Table Of Contents

Citation

Sprengel, E., Jaggi, M., Kilcher, Y., & Hofmann, T. (2016). Audio-Based Bird Species Identification using Deep Learning Techniques. Eidgenössische Technische Hochschule (ETH) Zürich, Zürich, Switzerland. Retrieved from ETH Zurich.

We implemented our approaches inspired by the techniques used in this paper "Audio-Based Bird Species Identification using Deep Learning Techniques" by Sprengel.

Dataset

Our data was used from the BirdClef Kaggle competition. We had training data that had bird calls of different lengtha and quality.

Our primary issue was finding a method to convert audio into an image that could be processed by our CNN. To achieve this, we transformed the audio data into spectrograms while making sure to isolate the bird calls. We also standardized the spectrograms into the same size and amplitude scale. We accounted for background noise and enhanced our model by layering different types of background noise and applying pitch and time shifting to the audio.

More info can be found here:

Kaggle competition: https://www.kaggle.com/competitions/birdclef-2024/overview

Files for each step

Data Pipeline

Data_Pipeline

Preprocessing

To divide the sound file into signal and noise parts, we first computed a spectrogram of the entire file using a short-time Fourier transform (STFT). For the signal, we applied binary filters to remove noise, created an indicator vector that marks the signal interval, and identified pixels from the spectrogram that were three times the row and column median. A similar process is used for noise. The threshold is slightly lower with 2.5 times the median. The signal and noise are then extracted, and placed into separate files. Their respective spectrograms are then computed.

  • Signal/Noise Separation

    To achieve this we:

    1. Created signal and noise masks

    2. Apply binary erosion and dilation to the spectrogram

    3. Observe how dilation refines signal presence and signal presence indicator

    Screen Shot 2024-08-14 at 4 00 07 PM Screen Shot 2024-08-14 at 4 00 28 PM
  • Creating Spectrograms

This code shows the audio waveforms in spectrograms, which can then be used as input to a convolutional neural network (CNN). Some functions generate spectrograms using Short-Time Fourier Transform (STFT), normalize the spectrogram data, and apply log scaling for better frequency representation. The code allows visualizing the spectrograms and isolating specific segments based on predefined criteria. The source code is inspired by TensorFlow's audio processing tutorial.

Screen Shot 2024-08-18 at 3 11 06 PM Screen Shot 2024-08-14 at 4 00 07 PM

Data Augmentation

We performed data augmentation on the spectrograms using four key methods:

  1. Splititng the spectrogram into chunks of a specific size and adding pads to the end to check all chunks are equal size.

  2. Shifting the spectrogram along the time axis by a specified amount, simulating a time delay.

  3. Shifting the spectrogram along the frequency axis to simulate a change in pitch.

  4. Lowering the amplitude of the spectrogram by a given factor to resemble quieter audio.

Screen Shot 2024-08-23 at 4 13 03 PM Screen Shot 2024-08-23 at 4 12 42 PM

CNN

To create our Convolutional Neural Network (CNN), we used custom data generators with multi-processing, this allowed for dynamic data loading and preprocessing. It was designed with multiple convolutional layers, and optimized through hyperparameter tuning to balance accuracy and performance.

Screen Shot 2024-08-29 at 2 56 42 PM Screen Shot 2024-08-29 at 2 57 27 PM Screen Shot 2024-08-29 at 2 57 13 PM Screen Shot 2024-08-29 at 2 56 59 PM

Results

During the course of our project, we encountered significant limitations related to computational resources. Due to these constraints, we were unable to run our final model on the full dataset as initially planned. Instead, we opted to develop and test a prototype model using a smaller subset of the data, specifically focusing on just 5 birds. This approach allowed us to manage our computational load effectively while still providing valuable insights into the model’s performance.

Despite the reduced scale, our prototype model achieved an accuracy of 78.12% on both the training and validation datasets. This level of performance was consistent across different model configurations, which was initially surprising given the variations in the number of parameters among these configurations. The fact that the test accuracy for various model setups also yielded the same accuracy indicates a stable performance across different model architectures.

Screen Shot 2024-09-01 at 11 41 50 AM Screen Shot 2024-09-01 at 11 42 18 AM

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •