iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

Li, Haoyu; Fu, Szu-Wei; Tsao, Yu; Yamagishi, Junichi

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2004.00932 (eess)

[Submitted on 2 Apr 2020 (v1), last revised 7 Apr 2020 (this version, v2)]

Title:iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

Authors:Haoyu Li, Szu-Wei Fu, Yu Tsao, Junichi Yamagishi

View PDF

Abstract:The intelligibility of natural speech is seriously degraded when exposed to adverse noisy environments. In this work, we propose a deep learning-based speech modification method to compensate for the intelligibility loss, with the constraint that the root mean square (RMS) level and duration of the speech signal are maintained before and after modifications. Specifically, we utilize an iMetricGAN approach to optimize the speech intelligibility metrics with generative adversarial networks (GANs). Experimental results show that the proposed iMetricGAN outperforms conventional state-of-the-art algorithms in terms of objective measures, i.e., speech intelligibility in bits (SIIB) and extended short-time objective intelligibility (ESTOI), under a Cafeteria noise condition. In addition, formal listening tests reveal significant intelligibility gains when both noise and reverberation exist.

Comments:	5 pages, Submitted to INTERSPEECH 2020
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2004.00932 [eess.AS]
	(or arXiv:2004.00932v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2004.00932

Submission history

From: Haoyu Li [view email]
[v1] Thu, 2 Apr 2020 11:01:16 UTC (5,002 KB)
[v2] Tue, 7 Apr 2020 11:02:09 UTC (5,480 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators