Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition

Wang, Ke; Zhang, Junbo; Sun, Sining; Wang, Yujun; Xiang, Fei; Xie, Lei

doi:10.21437/Interspeech.2018-1780

Computer Science > Sound

arXiv:1803.10132 (cs)

[Submitted on 27 Mar 2018 (v1), last revised 25 Oct 2018 (this version, v3)]

Title:Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition

Authors:Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang, Lei Xie

View PDF

Abstract:We investigate the use of generative adversarial networks (GANs) in speech dereverberation for robust speech recognition. GANs have been recently studied for speech enhancement to remove additive noises, but there still lacks of a work to examine their ability in speech dereverberation and the advantages of using GANs have not been fully established. In this paper, we provide deep investigations in the use of GAN-based dereverberation front-end in ASR. First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads a significant improvement as compared with feed-forward DNN and CNN in our dataset. Second, further adding residual connections in the deep LSTMs can boost the performance as well. Finally, we find that, for the success of GAN, it is important to update the generator and the discriminator using the same mini-batch data during training. Moreover, using reverberant spectrogram as a condition to discriminator, as suggested in previous studies, may degrade the performance. In summary, our GAN-based dereverberation front-end achieves 14%-19% relative CER reduction as compared to the baseline DNN dereverberation network when tested on a strong multi-condition training acoustic model.

Comments:	Interspeech 2018
Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1803.10132 [cs.SD]
	(or arXiv:1803.10132v3 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.1803.10132
Journal reference:	Proceedings of Interspeech, 2018, pp. 1581-1585
Related DOI:	https://doi.org/10.21437/Interspeech.2018-1780

Submission history

From: Ke Wang [view email]
[v1] Tue, 27 Mar 2018 15:23:12 UTC (104 KB)
[v2] Sun, 17 Jun 2018 08:15:04 UTC (105 KB)
[v3] Thu, 25 Oct 2018 07:01:25 UTC (105 KB)

Computer Science > Sound

Title:Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators