- Python >=3.12.2
- Required packages:
python install -r requirement.txt - vicuna-7b-v1.3 from https://huggingface.co/lmsys/vicuna-7b-v1.3 in the folder 'LLMs' (or others such as Meta-llama3.2-8b-Instruct)
- CLIP: git clone https://github.com/openai/CLIP
This repository contains three experiments associated to three different tasks and datasets:
- Spoken text decoding (convers): Multimodal spoken text decoding during conversations (main task of this work).
- Perceived speech decoding (perceived): Decoding the textual content of listened stories.
- Brain captioning (NSD): Decoding the captions of viewed images using the NSD datasets.
- Decoding reading text from EEG signals.
In the following, we detail the steps to reproduce the results of each experiment presented in the paper.
- Update the configuration file by specifying the following paths: DATA_PATH (ex. data/convers), RAW_FMRI_DATA_PATH (ex. data/fmri_convers), MODELS_TRAIN_PATH (ex. trained_models/convers), LLM_PATH (ex. LLMs/Meta-llama3.2-8b-Instruct)
- Download the the Convers datasets version 2.2.0 from the OpenNeuro platform ds001740 inside RAW_FMRI_DATA_PATH specified in the config file.
- Create a folder named "raw_data/transcriptions" inside DATA_PATH and upload the raw Transcriptions from the Ortolang platform convers/v2 into it:
With DATA_PATH set to "data/convers", you should obtain a structure similar to this after data preprocessing:
data
└── convers
├── preprocessed_fmri_data
│ └── fMRI_data_200
├── processed_data
│ ├── fMRI_data_split
│ ├── interlocutor_text_data
│ └── participant_text_data
├── raw_data
│ ├── transcriptions
│ └── fmri
├── test.json
└── train.json
# Preprocessing raw data
python exps/convers/process_raw_bold_signal.py --n_rois 200 # Parcellation using 200 ROIs
python exps/convers/data_builder_tools/split_bold_files.py # Processing raw 4D voxel BOLD signals and segmenting them into fixed-duration chunks
python exps/convers/data_builder_tools/textgrid_to_text.py # Processing transcription files (conversations) and segmenting them into fixed-duration text sequences
# Building training and test data
python exps/convers/data_builder_tools/build_data.py # Using json files to save paths of bold chunks and the [input, output] text for instruction tuning
python exps/convers/data_builder_tools/build_tokenizer.py # Building the tokenizer for the first stage of training
# Training and testing after each save_epoch
python exps/convers/train_stage1.py -m DeconvBipartiteTransformerConv --batch_size 128 --epochs 200 # Stage-1: training the DeconvBipartite Transformer
python exps/convers/train_stage2.py --batch_size 32 --epochs 100 -m BrainDEC_V0 --save_epochs 50 # Stage-2. Note: BrainDEC_V1 or BrainDEC_V2 converge quickly than V0, only 20 epochs are needed.
# Evaluate the results of the test set and save the scores
python exps/convers/evaluation.py - Update the configuration file by specifying the following paths: RAW_FMRI_DATA_PATH (ex. data/perceived), MODELS_TRAIN_PATH (ex. trained_models/perceived), and LLM_PATH (ex. LLMs/Meta-llama3.2-8b-Instruct).
- In the folders "DATA_TRAIN_DIR" and "DATA_TEST_DIR" (see the config file), download the training and test datasets as outlined in this project semantic-decoding.
With DATA_PATH set to "data/perceived" for example, you should obtain a structure similar to this after data preprocessing:
data
└── perceived
├── data_test
├── data_train
└── processed
├── S1
├── S2
├── S3
├── fMRI_data_test_split
└── fMRI_data_train_split
# Data preparation
python exps/perceived/prepare_datasets.py -s $subject (for $subject in ['S1', 'S2', 'S3'])
# Build tokenizer for stage 1
python exps/perceived/build_tokenizer.py
# Stage-1 training (in a cross-subject manner)
python exps/perceived/train_stage1.py --batch_size 128
# Stage-2 training (for each subject separately)
python exps/perceived/train_stage2.py --batch_size 32 -s $subject (for $subject in ['S1', 'S2', 'S3'])
# Evaluate the results of the test set and save the scores
python exps/perceived/evaluation.py $subject (for $subject in ['S1', 'S2', 'S3'])This is a comparison with brain understanding benchmark (BrainHub), based on Natural Scenes Dataset NSD and COCO.
- The processed datasets are available in here.
- Download the datasets using this script.
- Download COCO annotations from this link in the folder 'tools'
- Update the configuration file to specify the paths, and eventually to modify the hyperparameters.
data
└── nsd
├── webdataset_avg_split
│ ├── test
│ ├── train
└── └── val
- To train and evaluate the model:
# Stage-1 training (in a cross-subject manner)
python exps/zuco/build_tokenizer.py
python exps/nsd/train_stage1.py --batch_size 128
# Stage-2 training (for each subject separately)
python exps/nsd/train_stage2.py --epochs 6 --save_epochs 1 --batch_size 32 -s $subject (choices=[1, 2, 5, 7])To get the evaluation scores for each subject based on the generated files of the test set, refer to the Benchmark project.
With DATA_PATH set to "data/nsd", you should obtain the following structure:
The results presented here improve upon those reported in the paper by (1) training the first stage in a cross-subject manner, (2) using curated COCO annotations (COCO_73k_annots_curated.npy) and (3) adjusting the decoder LLM’s inference hyperparameters (see the configuration file). Results may vary slightly due to initialization and non-deterministic algorithms, but the variation remains low. Reported BrainDEC values are averaged over three runs. We compare our model with existing methods from the BrainHub benchmark:
| Method | Eval | BLEU1 | BLEU4 | METEOR | ROUGE | CIDEr | SPICE | CLIPS | RefCLIPS |
|---|---|---|---|---|---|---|---|---|---|
| UMBRAE | S1 | 59.44 | 19.03 | 19.45 | 43.71 | 61.06 | 12.79 | 67.78 | 73.54 |
| UMBRAE-S1 | S1 | 57.63 | 16.76 | 18.41 | 42.15 | 51.93 | 11.83 | 66.44 | 72.12 |
| BrainDEC | S1 | 61.29 | 19.68 | 17.99 | 44.47 | 53.82 | 10.67 | 63.09 | 69.60 |
| BrainCap | S1 | 55.96 | 14.51 | 16.68 | 40.69 | 41.30 | 9.06 | 64.31 | 69.90 |
| OneLLM | S1 | 47.04 | 9.51 | 13.55 | 35.05 | 22.99 | 6.26 | 54.80 | 61.28 |
| SDRecon | S1 | 36.21 | 3.43 | 10.03 | 25.13 | 13.83 | 5.02 | 61.07 | 66.36 |
| Method | Eval | BLEU1 | BLEU4 | METEOR | ROUGE | CIDEr | SPICE | CLIPS | RefCLIPS |
|---|---|---|---|---|---|---|---|---|---|
| UMBRAE | S2 | 59.37 | 18.41 | 19.17 | 43.86 | 55.93 | 12.08 | 66.46 | 72.36 |
| UMBRAE-S2 | S2 | 57.18 | 17.18 | 18.11 | 41.85 | 50.62 | 11.50 | 64.87 | 71.06 |
| BrainDEC | S2 | 59.28 | 17.99 | 17.75 | 43.60 | 51.53 | 9.88 | 62.86 | 69.27 |
| BrainCap | S2 | 53.80 | 13.03 | 15.90 | 39.96 | 35.60 | 8.47 | 62.48 | 68.19 |
| SDRecon | S2 | 34.71 | 3.02 | 9.60 | 24.22 | 13.38 | 4.58 | 59.52 | 65.30 |
| Method | Eval | BLEU1 | BLEU4 | METEOR | ROUGE | CIDEr | SPICE | CLIPS | RefCLIPS |
|---|---|---|---|---|---|---|---|---|---|
| UMBRAE | S5 | 60.36 | 19.03 | 20.04 | 44.81 | 61.32 | 13.19 | 68.39 | 74.11 |
| UMBRAE-S5 | S5 | 58.99 | 18.73 | 19.04 | 43.30 | 57.09 | 12.70 | 66.48 | 72.69 |
| BrainDEC | S5 | 61.82 | 19.57 | 18.70 | 44.63 | 57.65 | 11.32 | 64.03 | 70.26 |
| BrainCap | S5 | 55.28 | 14.62 | 16.45 | 40.87 | 41.05 | 9.24 | 63.89 | 69.64 |
| SDRecon | S5 | 34.96 | 3.49 | 9.93 | 24.77 | 13.85 | 5.19 | 60.83 | 66.30 |
| Method | Eval | BLEU1 | BLEU4 | METEOR | ROUGE | CIDEr | SPICE | CLIPS | RefCLIPS |
|---|---|---|---|---|---|---|---|---|---|
| UMBRAE | S7 | 57.20 | 17.13 | 18.29 | 42.16 | 52.73 | 11.63 | 65.90 | 71.83 |
| UMBRAE-S7 | S7 | 55.71 | 15.75 | 17.51 | 40.64 | 47.07 | 11.26 | 63.66 | 70.09 |
| BrainDEC | S7 | 59.07 | 17.97 | 17.48 | 43.07 | 49.22 | 9.90 | 61.52 | 68.06 |
| BrainCap | S7 | 54.25 | 14.00 | 15.94 | 40.02 | 37.49 | 8.57 | 62.52 | 68.48 |
| SDRecon | S7 | 34.99 | 3.26 | 9.54 | 24.33 | 13.01 | 4.74 | 58.68 | 64.59 |
The same raw data and preprocessing presented in EEG-To-Text are employed here.
- Update the configuration files "configs/configs_zuco.py" by specifying the paths similarly to the previous experiments.
- Download the following folders from ZuCo v1.0 and place them in the
DATA_PATHspecified in the config file (e.g.,data/zuco/task1-SR/Matlab_files, etc.). - Download
task1-NR/Matlab_filesfrom ZuCo v2.0 and place it astask2-NR-2.0/Matlab_filesinsideDATA_PATH. - Generate the preprocessed data using the following instructions:
With DATA_PATH set to data/zuco, for example, you should obtain the following structure after data preprocessing:
data
└── zuco
├── processed
│ ├── task1-SR
│ ├── task2-NR
│ ├── task2-NR-2.0
│ └── task3-TSR
├── task1-SR
│ └── Matlab_files
├── task2-NR
│ └── Matlab_files
├── task2-NR-2.0
│ └── Matlab_files
└── task3-TSR
└── Matlab_files
# Data preparation
python exps/zuco/preprocess_data.py -t task1-SR
python exps/zuco/preprocess_data.py -t task2-NR
python exps/zuco/preprocess_data.py -t task3-TSR
python exps/zuco/preprocess_data_v2.py
# Build tokenizer for stage-1
python exps/zuco/build_tokenizer.py
# Training and evaluation
python exps/zuco/train_stage1.py --batch_size 128 --epochs 20
python exps/zuco/train_stage2.py --batch_size 16 --epochs 4
python exps/zuco/evaluation.py- Apply the proposed methodology for NSD datasets.
- Test other LLM decoders.
- Add experiments for decoding text from EEG signals.
- Cross-subject training for NSD dataset.
-
The structure of this repository is in work progress
-
Some parts of the code of this project are adapted from InstructBlip, we thank the authors for their great work.
-
In the comparison on perceived speech decoding, we used the same datasets and configuration setup in this article. Data preprocessing and preparation scripts are taken from this link. We thank the authors for their great work.
@article{hmamouche2026braindec103589,
title = {BrainDEC: A Multimodal LLM for the Non-Invasive Decoding of Text from Brain Recordings},
journal = {Information Fusion},
volume = {127},
pages = {103589},
year = {2026},
issn = {1566-2535},
doi = {https://doi.org/10.1016/j.inffus.2025.103589},
url = {https://www.sciencedirect.com/science/article/pii/S156625352500661X},
author = {Youssef Hmamouche and Ismail Chihab and Lahoucine Kdouri and Amal El Fallah Seghrouchni}
}