Enhancing Listened Speech Decoding from EEG via Parallel Phoneme Sequence Prediction

Lee, Jihwan; Feng, Tiantian; Kommineni, Aditya; Kadiri, Sudarsana Reddy; Narayanan, Shrikanth

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2501.04844 (eess)

[Submitted on 8 Jan 2025]

Title:Enhancing Listened Speech Decoding from EEG via Parallel Phoneme Sequence Prediction

Authors:Jihwan Lee, Tiantian Feng, Aditya Kommineni, Sudarsana Reddy Kadiri, Shrikanth Narayanan

View PDF HTML (experimental)

Abstract:Brain-computer interfaces (BCI) offer numerous human-centered application possibilities, particularly affecting people with neurological disorders. Text or speech decoding from brain activities is a relevant domain that could augment the quality of life for people with impaired speech perception. We propose a novel approach to enhance listened speech decoding from electroencephalography (EEG) signals by utilizing an auxiliary phoneme predictor that simultaneously decodes textual phoneme sequences. The proposed model architecture consists of three main parts: EEG module, speech module, and phoneme predictor. The EEG module learns to properly represent EEG signals into EEG embeddings. The speech module generates speech waveforms from the EEG embeddings. The phoneme predictor outputs the decoded phoneme sequences in text modality. Our proposed approach allows users to obtain decoded listened speech from EEG signals in both modalities (speech waveforms and textual phoneme sequences) simultaneously, eliminating the need for a concatenated sequential pipeline for each modality. The proposed approach also outperforms previous methods in both modalities. The source code and speech samples are publicly available.

Comments:	ICASSP 2025
Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Signal Processing (eess.SP)
Cite as:	arXiv:2501.04844 [eess.AS]
	(or arXiv:2501.04844v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2501.04844

Submission history

From: Jihwan Lee [view email]
[v1] Wed, 8 Jan 2025 21:11:35 UTC (1,554 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Enhancing Listened Speech Decoding from EEG via Parallel Phoneme Sequence Prediction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Enhancing Listened Speech Decoding from EEG via Parallel Phoneme Sequence Prediction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators