An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation

João Pedro Carvalho Moreira^1,#, Vinícius Rezende Carvalho¹, Eduardo Mazoni Andrade Marçal Mendes¹, Ariah Fallah², Terrence J. Sejnowski^3,4,5, Claudia Lainscsek^3,4, Lindy Comstock^2,6,*,#

¹ Postgraduate Program in Electrical Engineering, Federal University of Minas Gerais, Belo Horizonte, MG 31270-901, Brazil
² Department of Neurosurgery, University of California, Los Angeles, Los Angeles, CA 90095, USA
³ Computational Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
⁴ Institute for Neural Computation University of California San Diego, La Jolla, CA 92093, USA
⁵ Division of Biological Sciences, University of California San Diego, La Jolla, CA 92093, USA
⁶ Department of Linguistics, National Research University Higher School of Economics, Moscow 101000, RF
^* Corresponding author(s): Lindy Comstock ([email protected])
^# These authors contributed equally to this work

ABSTRACT

Electroencephalography (EEG) holds promise for brain-computer interface (BCI) devices as a non-invasive measure of neural activity. With increased attention to EEG-based BCI systems, publicly available datasets that can represent the complex tasks required for naturalistic speech decoding are necessary to establish a common standard of performance within the BCI community. Effective solutions must overcome various kinds of noise in the EEG signal and remain reliable across sessions and subjects without overfitting to a specific dataset or task. We present two validated datasets (N=8 and N=16) for classification at the phoneme and word level and by the articulatory properties of phonemes. EEG signals were recorded from 64 channels while subjects listened to and repeated six consonants and five vowels. Individual phonemes were combined in different phonetic environments to produce coarticulated variation in forty consonant-vowel pairs, twenty real words, and twenty pseudowords. Phoneme pairs and words were presented during a control condition and during transcranial magnetic stimulation targeted to inhibit or augment the EEG signal associated with specific articulatory processes.

CODE AVAILABILITY

The data and codes used in this work are available at OSF to allow reproducibility and sharing of information under the CC BY 4.0 license (http://creativecommons.org/licenses/by-nc-nd/4.0/). The routines can be found in the Study/EEG_Data_Processing/Code folder. These routines are responsible for the analyses presented in the technical validation section. The results obtained for both signal processing techniques, as discussed in Data Processing section, are placed in the same folder. The same code is available on GithHub so as to allow for version control and discussion of the implementation and analysis carried out in this work. The routines were built to obtain the ERP using only ICA and signal cleaning was performed using the pipeline described in Figure 1, based on the EEGLab library versions 2022.0 and 2022.1 native to MATLAB.

Figure 1 - Code structure to data processing.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
events_information		events_information
figures		figures
matlab_code		matlab_code
sensors_layout		sensors_layout
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation

ABSTRACT

CODE AVAILABILITY

About

Uh oh!

Releases 1

Packages

Languages

mcjpedro/speech_decoding

Folders and files

Latest commit

History

Repository files navigation

An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation

ABSTRACT

CODE AVAILABILITY

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages