Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Analysis and visualization of environmental sound data from 'The Sounds of Home' project. This repository includes metadata, ontology, data processing scripts, and a GUI for interactive exploration of home-related sound categories and predictions. Structured for reproducible research and community collaboration.

Notifications You must be signed in to change notification settings

gbibbo/sounds_of_home_analysis

Repository files navigation

Sounds of Home Analysis

This repository provides tools for analyzing and visualizing sound events detected by recorders from the Sounds of Home Dataset. The analysis framework leverages the hierarchical structure of the AudioSet ontology, enabling systematic exploration and categorization of domestic soundscapes.

Interactive Analysis Interface

The main analysis interface allows comprehensive exploration of the dataset:

Application Interface

Generated visualizations display sound event distributions:

Example Plot

Installation and Setup

  1. Environment Requirements:
  • Python 3.6 or higher
  • Git (for cloning the repository)
  1. Install Project:
git clone https://github.com/gbibbo/sounds_of_home_analysis.git
cd sounds_of_home_analysis
pip install -e .
  1. Download Dataset:
  • Visit the Sounds of Home Dataset page
  • Download the prediction JSON files
  • Create a 'data' directory in the repository root:
mkdir data
  • Place downloaded JSON files in the 'data' directory
  1. Configure Data Path:
  • Open src/config.py
  • Set PREDICTIONS_ROOT_DIR = 'data'
  1. Launch Interface:
python scripts/main.py --gui

Using the Interface

  1. Select Parameters:
  • Confidence Threshold: Filter events by prediction confidence
  • Recorders: Select the recorders to be used for audio analysis. Note how the recorders were originally installed
  • Sound Classes: Select event types to analyze
  • Days: Specify analysis timeframe
  1. Generate Analysis:
  • Click Plot for time series analysis, or Plot and Analysis for time series analysis, basic statistical analysis, correlation analysis, PCA, heatmaps and clustering, peak activity analysis
  • View graph showing event distribution
  • Results automatically save to assets/images directory

Analysis Scripts

Batch Analysis

The batch_analysis.py script performs analysis across multiple confidence thresholds:

  1. Threshold Options:
  • Fixed thresholds: [0.0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5]
  • Variable threshold: Adapts based on AudioSet label quality
    • Uses linear interpolation:
      threshold = 0.1 + (0.35 - 0.1) * (label_quality / 100)
    • Example:
      • For 100% quality label: threshold = 0.5
      • For 50% quality label: threshold = 0.35
      • For 0% quality label: threshold = 0.2
  1. Usage:
python scripts/batch_analysis.py
  1. Output Directory Structure:
analysis_results/
└── batch_analysis_results/
    ├── analysis_results_threshold_0.0.json
    ├── analysis_results_threshold_0.05.json
    ...
    ├── analysis_results_threshold_0.5.json
    └── analysis_results_threshold_variable.json
  1. Customizing Analysis:
  • Configure data selection in src/config.py:
    SELECTED_RECORDERS = []  # Empty list means all recorders
    SELECTED_DAYS = []      # Empty list means all available days
    SELECTED_HOURS = []     # Empty list means all hours

### Granger Causality Analysis

The `granger.py` script analyzes temporal relationships between sound events:

1. **Analysis Features**:
- Time Series Analysis using ARIMA models
- Cross-Correlation Functions with Lag Analysis
- Granger Causality Tests
- Principal Component Analysis (PCA)
- UMAP and t-SNE visualizations
- Animated temporal evolution visualization

2. **Usage**:
```bash
python scripts/granger.py
  1. Output Directory Structure:
granger/
├── figures/
│   ├── time_series.png
│   ├── correlation_matrix.png
│   ├── top_correlations.png
│   ├── pca_results.png
│   ├── umap_results.png
│   ├── tsne_results.png
│   ├── umap_frames/
│   └── umap_animation_custom.gif
├── results/
│   ├── significant_correlations.json
│   ├── granger_causality_results.json
│   └── umap_intermediate_data.pkl
└── logs/
    └── analysis_log_[timestamp].txt

Minute-Level Analysis

The generate_minute_data.py script processes audio events with minute-level resolution:

  1. Features:
  • Aggregates detection counts across all recorders
  • Applies quality-based confidence thresholds
  • Processes individual AudioSet classes without ontology aggregation
  1. Usage:
python scripts/generate_minute_data.py
  1. Output:
analysis_results/
└── minute_analysis_results/
    └── minute_counts.json  # Minute-by-minute event counts

Events Statistics

The events_statistics.py script generates comprehensive statistical information about sound event occurrences:

  1. Analysis Features:
  • Processes multiple JSON prediction files
  • Handles hierarchical AudioSet relationships
  • Applies confidence thresholds
  • Generates category and subcategory statistics
  • Creates visualization plots

Events Statistics

  1. Usage:
python scripts/events_statistics.py
  1. Output Directory Structure:
analysis_results/
└── events_statistics_results/
    ├── events_statistics_results.json
    ├── main_categories.png          # Overall category distribution
    └── subcategories_*.png         # Detailed subcategory analysis
  1. Configuration Options:
  • Adjust in src/config.py:
    • PREDICTIONS_ROOT_DIR: Data location
    • DEFAULT_CONFIDENCE_THRESHOLD: Base threshold
    • USE_LABEL_QUALITY_THRESHOLDS: Enable/disable quality-based thresholds
    • GENERATE_GRAPHS: Control visualization output
    • CUSTOM_CATEGORIES: Define category structure

Dynamic visualization

Here's a dynamic preview of the application interface that you can also find in the dataset website

SOH Visualization

Dataset

This project is designed to run with the dataset that can be downloaded from:

Sounds of Home Dataset

Download the dataset and ensure the prediction files (JSON files) are located in the appropriate directory within the project, as specified in the configuration.

Project Structure

.
├── analysis_results
│   ├── batch_analysis_results
│   │   └── analysis_results_threshold_*.json
│   └── events_statistics_results
│       ├── events_statistics_results.json
│       ├── main_categories.png
│       └── subcategories_*.png
├── assets
│   └── images
│       ├── interface.png
│       └── plot.png
├── metadata
│   ├── class_labels_indices.csv
│   └── ontology.json
├── README.md
├── requirements.txt
├── scripts
│   ├── batch_analysis.py
│   ├── events_statistics.py
│   ├── main.py
│   └── plot_results.py
├── setup.py
├── src
│   ├── config.py
│   ├── data_processing
│   │   ├── load_data.py
│   │   ├── process_data.py
│   │   └── utils.py
│   ├── gui
│   │   └── tkinter_interface.py
│   └── visualization
│       └── plot_data.py
└── tests
    └── test_data_processing.py

Note: The directories and files excluded by .gitignore (such as sample data and analysis results) are not shown in the project structure.

Contributing

Contributions are welcome. To contribute:

  1. Fork the repository.

  2. Create your feature branch:

git checkout -b feature/new-feature
  1. Commit your changes:
git commit -m 'Add new feature'
  1. Push to the branch:
git push origin feature/new-feature
  1. Open a Pull Request on GitHub.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions or support, you can reach me through:

About

Analysis and visualization of environmental sound data from 'The Sounds of Home' project. This repository includes metadata, ontology, data processing scripts, and a GUI for interactive exploration of home-related sound categories and predictions. Structured for reproducible research and community collaboration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published