Problem Statement:
The objec)ve of this project is to iden)fy the correct audio device from a set of audio devices. The dataset
consists of train, development, and test samples. Each sample can have between 2 and 5 possible device op)ons.
The task is to develop a mul)-class classifier (or any other suitable model) that can predict the correct device
selec)on.
Dataset:
The dataset is divided into train, development (dev), and evalua)on (eval) sets. The input data comprises audio
recordings from all the devices. To simplify the processing, I will provide pre-extracted features instead of audio
files. There are two types of features available:
1. Single Feature Vector: Each device in each training sample is associated with a 640-dimensional feature
vector. The size of the feature vector may vary.
2. Time-Series Feature: Alterna)vely, we can provide a )me-series feature matrix for each device in each
training sample. This matrix has dimensions of (Feature dimension x Time stamps).
The features for all samples will be of the same dimension. To handle scenarios where the number of devices is
less than the maximum of 5, non-exis)ng devices will be represented as zeros or very small values (e.g., 1e-8) in
the feature representa)on.
Target Device/Class: For each sample in the dataset, the target device for selec)on will be provided as ground
truth informa)on. The target device value lies between 0 and 4, represen)ng the device op)ons.
Data Details:
• Train Samples: A dataset containing 120,000+ samples for training the model will be provided.
• Dev Samples: A dataset containing 6,000+ samples will be available for fine-tuning the models.
• Eval Samples: Two evalua)on sets will be shared. The easy set comprises 7,000+ samples, while the
difficult set comprises 2,000+ samples.
PyTorch Model:
To achieve the desired accuracy of at least 70% on the evalua)on sets, please employ advanced models in
PyTorch, such as recurrent neural networks (RNNs) and transformer models, which have demonstrated success
in audio and sequence classifica)on tasks. These models offer the poten)al for improved performance compared
to regular CNNs and DNNs. You are welcome to try whatever works the best for the dataset. I will leave this to
your exper)se.
Deliverables:
Upon comple)on, we will provide the following deliverables:
1. Trained Model: The trained PyTorch model script and weights; capable of audio device selec)on.
2. Decoding Scripts: Scripts to decode the predic)ons of the model and map them to the corresponding
audio devices.
3. Addi)onal Scripts and Insights: We will provide scripts that generate insights, such as plots, histograms,
correla)ons, and any other relevant analysis, to support your report and provide a deeper understanding of the
model's performance.