Cross-domain Single-channel Speech Enhancement Model with Bi-projection Fusion Module for Noise-robust ASR

Chao, Fu-An; Hung, Jeih-weih; Chen, Berlin

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2108.11598 (eess)

[Submitted on 26 Aug 2021]

Title:Cross-domain Single-channel Speech Enhancement Model with Bi-projection Fusion Module for Noise-robust ASR

Authors:Fu-An Chao, Jeih-weih Hung, Berlin Chen

View PDF

Abstract:In recent decades, many studies have suggested that phase information is crucial for speech enhancement (SE), and time-domain single-channel speech enhancement techniques have shown promise in noise suppression and robust automatic speech recognition (ASR). This paper presents a continuation of the above lines of research and explores two effective SE methods that consider phase information in time domain and frequency domain of speech signals, respectively. Going one step further, we put forward a novel cross-domain speech enhancement model and a bi-projection fusion (BPF) mechanism for noise-robust ASR. To evaluate the effectiveness of our proposed method, we conduct an extensive set of experiments on the publicly-available Aishell-1 Mandarin benchmark speech corpus. The evaluation results confirm the superiority of our proposed method in relation to a few current top-of-the-line time-domain and frequency-domain SE methods in both enhancement and ASR evaluation metrics for the test set of scenarios contaminated with seen and unseen noise, respectively.

Comments:	6 pages, 3 figures, Accepted by ICME 2021
Subjects:	Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD); Signal Processing (eess.SP)
Cite as:	arXiv:2108.11598 [eess.AS]
	(or arXiv:2108.11598v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2108.11598

Submission history

From: Fu-An Chao [view email]
[v1] Thu, 26 Aug 2021 06:29:17 UTC (769 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Cross-domain Single-channel Speech Enhancement Model with Bi-projection Fusion Module for Noise-robust ASR

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Cross-domain Single-channel Speech Enhancement Model with Bi-projection Fusion Module for Noise-robust ASR

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators