Toward Universal Speech Enhancement for Diverse Input Conditions

Zhang, Wangyou; Saijo, Kohei; Wang, Zhong-Qiu; Watanabe, Shinji; Qian, Yanmin

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2309.17384 (eess)

[Submitted on 29 Sep 2023 (v1), last revised 16 Feb 2024 (this version, v2)]

Title:Toward Universal Speech Enhancement for Diverse Input Conditions

Authors:Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, Yanmin Qian

View PDF HTML (experimental)

Abstract:The past decade has witnessed substantial growth of data-driven speech enhancement (SE) techniques thanks to deep learning. While existing approaches have shown impressive performance in some common datasets, most of them are designed only for a single condition (e.g., single-channel, multi-channel, or a fixed sampling frequency) or only consider a single task (e.g., denoising or dereverberation). Currently, there is no universal SE approach that can effectively handle diverse input conditions with a single model. In this paper, we make the first attempt to investigate this line of research. First, we devise a single SE model that is independent of microphone channels, signal lengths, and sampling frequencies. Second, we design a universal SE benchmark by combining existing public corpora with multiple conditions. Our experiments on a wide range of datasets show that the proposed single model can successfully handle diverse conditions with strong performance.

Comments:	6 pages, 3 figures, 5 tables, published in ASRU 2023 (corrected the results of noisy speech on CHiME-4 (Simu) in Table 4)
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
Cite as:	arXiv:2309.17384 [eess.AS]
	(or arXiv:2309.17384v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2309.17384

Submission history

From: Wangyou Zhang [view email]
[v1] Fri, 29 Sep 2023 16:41:49 UTC (159 KB)
[v2] Fri, 16 Feb 2024 04:44:25 UTC (159 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Toward Universal Speech Enhancement for Diverse Input Conditions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Toward Universal Speech Enhancement for Diverse Input Conditions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators