Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios

Fiorio, Luan Vinícius; Defraene, Bruno; David, Johan; Young, Alex; Widdershoven, Frans; van Houtum, Wim; Aarts, Ronald M.

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2503.18590 (eess)

[Submitted on 24 Mar 2025]

Title:Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios

Authors:Luan Vinícius Fiorio, Bruno Defraene, Johan David, Alex Young, Frans Widdershoven, Wim van Houtum, Ronald M. Aarts

View PDF HTML (experimental)

Abstract:We propose a speaker selection mechanism (SSM) for the training of an end-to-end beamforming neural network, based on recent findings that a listener usually looks to the target speaker with a certain undershot angle. The mechanism allows the neural network model to learn toward which speaker to focus, during training, in a multi-speaker scenario, based on the position of listener and speakers. However, only audio information is necessary during inference. We perform acoustic simulations demonstrating the feasibility and performance when the SSM is employed in training. The results show significant increase in speech intelligibility, quality, and distortion metrics when compared to the minimum variance distortionless filter and the same neural network model trained without SSM. The success of the proposed method is a significant step forward toward the solution of the cocktail party problem.

Subjects:	Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Cite as:	arXiv:2503.18590 [eess.AS]
	(or arXiv:2503.18590v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2503.18590

Submission history

From: Luan Vinícius Fiorio [view email]
[v1] Mon, 24 Mar 2025 11:47:32 UTC (814 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators