Thanks to visit codestin.com
Credit goes to github.com

Skip to content

EEG signal classification system using machine learning to analyze and categorize brain activity patterns from electroencephalogram data.

License

Notifications You must be signed in to change notification settings

Metanome/eeg-classify

Repository files navigation

EEG Classification Project

A complete machine learning pipeline for classifying EEG data to distinguish between healthy subjects and those with schizophrenia. This project implements a 5-step workflow that converts raw EEG data to a deployed classification model.

Project Overview

This project provides an end-to-end solution for EEG-based mental health classification:

  1. Data Conversion: Convert proprietary .edf files to efficient .npy format
  2. Dataset Curation: Segment and label data for machine learning
  3. Hyperparameter Tuning: Automatically discover optimal model architecture (CNN and LSTM)
  4. Validation & Training: Robust evaluation and final model training
  5. Inference: Real-world prediction tool for new EEG recordings

Model Architectures

This project implements two different deep learning approaches:

  • CNN (Convolutional Neural Network): Optimized for spatial pattern recognition in EEG signals
  • LSTM (Long Short-Term Memory): Designed to capture temporal dependencies and sequential patterns

Both models are automatically tuned and cross-validated to determine the best architecture for your specific dataset.

Project Structure

eeg-classify/
β”œβ”€β”€ 1_convert_edf_to_npy.py      # Step 1: EDF to NPY conversion
β”œβ”€β”€ 2_curate_dataset.py          # Step 2: Dataset segmentation and labeling
β”œβ”€β”€ 3a_tune_cnn_model.py         # Step 3a: CNN hyperparameter tuning
β”œβ”€β”€ 3b_tune_lstm_model.py        # Step 3b: LSTM hyperparameter tuning
β”œβ”€β”€ 4a_train_and_validate_cnn.py # Step 4a: CNN cross-validation and training
β”œβ”€β”€ 4b_train_and_validate_lstm.py # Step 4b: LSTM cross-validation and training
β”œβ”€β”€ 5_predict.py                 # Step 5: Enhanced inference tool (batch support)
β”œβ”€β”€ compare_models.py            # Model comparison utility
β”œβ”€β”€ predict_drag_drop.bat        # Windows drag-and-drop prediction tool
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ README.md                    # This file
β”œβ”€β”€ data/                        # Data directory
β”‚   β”œβ”€β”€ raw/                     # Original EDF files
β”‚   β”‚   β”œβ”€β”€ healthy/             # Healthy subjects
β”‚   β”‚   └── schizophrenia/       # Schizophrenia subjects
β”‚   β”œβ”€β”€ npy/                     # Converted NPY files
β”‚   β”œβ”€β”€ X_dataset.npy            # Final feature dataset
β”‚   β”œβ”€β”€ y_dataset.npy            # Final labels
β”‚   β”œβ”€β”€ best_hyperparameters_cnn.txt # CNN optimal hyperparameters
β”‚   └── best_hyperparameters_lstm.txt # LSTM optimal hyperparameters
β”œβ”€β”€ models/                      # Trained models
β”‚   β”œβ”€β”€ eeg_classify_cnn_v1.h5   # Final trained CNN model
β”‚   β”œβ”€β”€ eeg_classify_lstm_v1.h5  # Final trained LSTM model
β”‚   β”œβ”€β”€ scaler_cnn.pkl           # CNN data scaler
β”‚   └── scaler_lstm.pkl          # LSTM data scaler
└── logs/                        # Training logs and tuning results
    β”œβ”€β”€ tuner_cnn/               # CNN hyperparameter tuning logs
    └── tuner_lstm/              # LSTM hyperparameter tuning logs

Quick Start

Prerequisites

  • Python 3.7 or higher
  • At least 8GB RAM (16GB recommended for large datasets)
  • GPU support recommended for model training

Installation

  1. Clone or download this project
  2. Install dependencies:
    pip install -r requirements.txt

Data Preparation

  1. Prepare your EDF data by organizing it in this structure:
    data/raw/
    β”œβ”€β”€ healthy/
    β”‚   β”œβ”€β”€ subject1.edf
    β”‚   β”œβ”€β”€ subject2.edf
    β”‚   └── ...
    └── schizophrenia/
        β”œβ”€β”€ subject1.edf
        β”œβ”€β”€ subject2.edf
        └── ...
    

Usage Guide

Step 1: Convert EDF to NPY

Convert your EDF files to efficient NumPy format:

python 1_convert_edf_to_npy.py

What it does:

  • Recursively finds all .edf files in data/raw/
  • Loads each file with MNE and extracts numerical data
  • Transposes to (timesteps, channels) format
  • Saves as .npy files in data/npy/

Expected output:

Found 45 EDF files to convert
βœ“ Converted: subject1.edf -> Shape: (12000, 19)
βœ“ Converted: subject2.edf -> Shape: (15000, 19)
...
Conversion completed successfully!

Step 2: Curate Dataset

Create segmented, labeled dataset for training:

python 2_curate_dataset.py

What it does:

  • Loads all .npy files from both healthy and schizophrenia folders
  • Creates overlapping segments (default: 1000 timesteps with 500 overlap)
  • Assigns labels (0 = healthy, 1 = schizophrenia)
  • Saves final dataset as X_dataset.npy and y_dataset.npy

Expected output:

Processing 25 files from data/npy/healthy
Processing 20 files from data/npy/schizophrenia
Final Dataset Summary:
  X shape: (2847, 1000, 19) (samples, timesteps, channels)
  y shape: (2847,) (samples,)
  Total segments: 2847
    - Healthy (label=0): 1423
    - Schizophrenia (label=1): 1424

Step 3: Hyperparameter Tuning

The project supports two different model architectures. You can train both and compare their performance:

Option A: CNN Model

python 3a_tune_cnn_model.py

Option B: LSTM Model

python 3b_tune_lstm_model.py

What they do:

  • CNN: Explores 1D Convolutional architectures optimized for spatial pattern recognition
  • LSTM: Explores Long Short-Term Memory architectures for temporal sequence modeling
  • Both use KerasTuner to find optimal hyperparameters automatically
  • Save results to data/best_hyperparameters_cnn.txt (CNN) and data/best_hyperparameters_lstm.txt (LSTM)

This step may take several hours! The scripts will show progress:

Starting hyperparameter search...
Trial 1/50: Training model with [architecture details]...
...
Best Hyperparameters Found:
  [Model-specific parameters]
  Validation Accuracy: 0.8764

Compare Both Models

python compare_models.py

This utility compares the performance of both architectures and recommends which to use for final training.

Step 4: Final Training & Validation

Based on your model comparison results, train the better-performing architecture:

For CNN Model:

python 4a_train_and_validate_cnn.py

For LSTM Model:

python 4b_train_and_validate_lstm.py

What they do:

  • Load optimal hyperparameters from Step 3
  • Perform 5-fold stratified cross-validation for robust evaluation
  • Train final model on entire dataset
  • Save trained models:
    • CNN: models/eeg_classify_cnn_v1.h5
    • LSTM: models/eeg_classify_lstm_v1.h5

Expected output:

Fold 1/5 Accuracy: 0.8723
Fold 2/5 Accuracy: 0.8891
...
Cross-validation accuracy: 0.8756 Β± 0.0134
Final model training accuracy: 0.9234
Model saved to: models/[model_name].h5

Step 5: Make Predictions

The enhanced prediction tool supports multiple modes for processing EEG recordings:

Single File Prediction

python 5_predict.py --model_path models/eeg_classify_cnn_v1.h5 --files data/test.edf

Batch Processing (Multiple Files)

python 5_predict.py --model_path models/eeg_classify_cnn_v1.h5 --files file1.edf file2.edf file3.edf

Directory Processing (All EDF files in a folder)

python 5_predict.py --model_path models/eeg_classify_cnn_v1.h5 --batch_dir data/test_subjects/

Drag and Drop (Windows)

  1. Use the batch file: Simply drag EDF files onto predict_drag_drop.bat
  2. Command line: python 5_predict.py --model_path models/eeg_classify_cnn_v1.h5 file1.edf file2.edf

CSV Export for Batch Results

python 5_predict.py --model_path models/eeg_classify_cnn_v1.h5 --batch_dir data/ --csv_output results.csv

Example batch output:

πŸ” Finding EDF files...
πŸ“ Found 15 EDF file(s) to process

[1/15] Processing: subject001.edf
πŸ“ subject001.edf: Healthy (Confidence: 89.2%)

[2/15] Processing: subject002.edf  
πŸ“ subject002.edf: Schizophrenia (Confidence: 76.4%)

...

BATCH PROCESSING SUMMARY
========================================
Total files processed: 15
Successful predictions: 15
Failed predictions: 0

Prediction Distribution:
  Healthy: 8 (53.3%)
  Schizophrenia: 7 (46.7%)
  Average confidence: 82.1%
  Total processing time: 45.67 seconds
  Average time per file: 3.04 seconds

Using LSTM Model

python 5_predict.py --model_path models/eeg_classify_lstm_v1.h5 --files data/test.edf --scaler_path models/scaler_lstm.pkl

Detailed Analysis Mode

python 5_predict.py --model_path models/eeg_classify_cnn_v1.h5 --files data/test.edf --verbose

Note: The prediction script defaults to the CNN model and scaler. When using the LSTM model, specify the corresponding scaler path.

Configuration

Segmentation Parameters

You can modify segmentation parameters in 2_curate_dataset.py:

SEGMENT_LENGTH = 1000    # Number of time steps per segment
OVERLAP = 500           # Number of overlapping time steps

Important: If you change these values, you must:

  1. Re-run Step 2 to recreate the dataset
  2. Re-run Steps 3-4 to retrain the model
  3. The same values are automatically used in Step 5

Model Architecture

The hyperparameter search in Step 3 explores:

  • Conv1D layers: 1-3 layers with varying filter counts (32-512)
  • Kernel sizes: 3, 5, 7, or 11 timesteps
  • Dropout rates: 0.1-0.6 for regularization
  • Dense layers: 1-2 fully connected layers (32-512 units)
  • Learning rates: 0.0001-0.01 (log scale)
  • Pooling: Global average or max pooling

Performance Metrics

The pipeline provides several evaluation metrics:

  • Cross-validation accuracy: Robust estimate of model performance
  • Per-fold results: Consistency across different data splits
  • Confidence scores: Uncertainty estimation for predictions
  • Segment-level analysis: How consistent predictions are across time

Troubleshooting

Common Issues

1. "No .edf files found"

  • Ensure your data is in data/raw/healthy/ and data/raw/schizophrenia/
  • Check file extensions are exactly .edf

2. "Dataset files do not exist"

  • Run previous steps in order (1 β†’ 2 β†’ 3 β†’ 4 β†’ 5)
  • Each step depends on outputs from previous steps

3. "Out of memory" during training

  • Reduce batch size in the model training scripts
  • Use a machine with more RAM
  • Consider reducing segment length

4. Poor model performance

  • Ensure you have enough training data (>20 subjects per class)
  • Check data quality and preprocessing
  • Try different segmentation parameters
  • Run hyperparameter tuning longer (increase MAX_TRIALS)

Data Quality Checks

Before running the pipeline, verify:

  • EDF files load correctly with MNE
  • All files have the same number of channels
  • Sampling rates are consistent
  • No corrupted or truncated recordings

Expected Results

With a good dataset, you should expect:

  • Cross-validation accuracy: 80-90%
  • Training time:
    • Step 1: Minutes
    • Step 2: Minutes
    • Step 3: Hours (depending on MAX_TRIALS)
    • Step 4: 30-60 minutes
    • Step 5: Seconds per prediction

Technical Details

Model Architecture

The final model is a 1D Convolutional Neural Network optimized for time-series EEG data:

  1. Input: EEG segments of shape (timesteps, channels)
  2. Conv1D layers: Extract temporal patterns
  3. Batch normalization: Stabilize training
  4. Max pooling: Reduce dimensionality
  5. Dropout: Prevent overfitting
  6. Global pooling: Aggregate features
  7. Dense layers: Final classification
  8. Sigmoid output: Binary probability

Data Processing

  • Standardization: Z-score normalization per channel
  • Segmentation: Overlapping windows for data augmentation
  • Stratification: Balanced train/validation splits

References

This implementation is based on research in EEG-based mental health classification:

  • EEG Processing: MNE-Python library
  • Deep Learning: TensorFlow/Keras
  • Hyperparameter Optimization: Keras Tuner with Hyperband algorithm
  • Evaluation: Stratified k-fold cross-validation

Contributing

Feel free to submit issues, suggestions, or improvements. This pipeline can be adapted for other EEG classification tasks.


Note: This is a research tool. Any medical applications should undergo proper validation and regulatory approval.

About

EEG signal classification system using machine learning to analyze and categorize brain activity patterns from electroencephalogram data.

Resources

License

Stars

Watchers

Forks