Speech Enhancement Guided by Contextual Articulatory Information

Lu, Yen-Ju; Chang, Chia-Yu; Yu, Cheng; Liu, Ching-Feng; Hung, Jeih-weih; Watanabe, Shinji; Tsao, Yu

Computer Science > Sound

arXiv:2011.07442v3 (cs)

[Submitted on 15 Nov 2020 (v1), revised 19 Jul 2022 (this version, v3), latest version 18 Jun 2023 (v5)]

Title:Speech Enhancement Guided by Contextual Articulatory Information

Authors:Yen-Ju Lu, Chia-Yu Chang, Cheng Yu, Ching-Feng Liu, Jeih-weih Hung, Shinji Watanabe, Yu Tsao

View PDF

Abstract:Previous studies have confirmed that by augmenting acoustic features with the place/manner of articulatory features, the speech enhancement (SE) process can be guided to consider the articulatory properties of the input speech when performing enhancement to attain performance improvements. Thus, the contextual information of articulatory attributes has additional information that can further benefit SE. This study proposed an SE system that improved performance by optimizing contextual articulatory information in enhanced speech through joint training of the SE model with an end-to-end automatic speech recognition (E2E-ASR) model and predicting the sequence of broad phone classes (BPCs) instead of the phoneme/word sequences. We developed two strategies to train the SE system based on BPC-based ASR: multi-task learning and deep-feature training strategies. Experimental results from speech denoising, speech dereverberation, and impaired speech enhancement tasks confirmed that the contextual articulatory information facilitates the SE system to improve enhancement results. Moreover, in contrast to another SE system trained with monophonic ASR, the BPC-based ASR (providing contextual articulatory information) can achieve superior SE performance at different signal-to-noise ratio (SNR) levels.

Comments:	submitted to TASLP
Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2011.07442 [cs.SD]
	(or arXiv:2011.07442v3 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2011.07442

Submission history

From: Yen-Ju Lu [view email]
[v1] Sun, 15 Nov 2020 03:56:37 UTC (1,422 KB)
[v2] Wed, 1 Dec 2021 20:47:04 UTC (14,178 KB)
[v3] Tue, 19 Jul 2022 19:09:05 UTC (17,297 KB)
[v4] Thu, 15 Jun 2023 11:39:47 UTC (19,936 KB)
[v5] Sun, 18 Jun 2023 11:52:45 UTC (17,814 KB)

Computer Science > Sound

Title:Speech Enhancement Guided by Contextual Articulatory Information

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Speech Enhancement Guided by Contextual Articulatory Information

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators