Forward-Backward Decoding for Regularizing End-to-End TTS

Zheng, Yibin; Wang, Xi; He, Lei; Pan, Shifeng; Soong, Frank K.; Wen, Zhengqi; Tao, Jianhua

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:1907.09006 (eess)

[Submitted on 18 Jul 2019]

Title:Forward-Backward Decoding for Regularizing End-to-End TTS

Authors:Yibin Zheng, Xi Wang, Lei He, Shifeng Pan, Frank K. Soong, Zhengqi Wen, Jianhua Tao

View PDF

Abstract:Neural end-to-end TTS can generate very high-quality synthesized speech, and even close to human recording within similar domain text. However, it performs unsatisfactory when scaling it to challenging test sets. One concern is that the encoder-decoder with attention-based network adopts autoregressive generative sequence model with the limitation of "exposure bias" To address this issue, we propose two novel methods, which learn to predict future by improving agreement between forward and backward decoding sequence. The first one is achieved by introducing divergence regularization terms into model training objective to reduce the mismatch between two directional models, namely L2R and R2L (which generates targets from left-to-right and right-to-left, respectively). While the second one operates on decoder-level and exploits the future information during decoding. In addition, we employ a joint training strategy to allow forward and backward decoding to improve each other in an interactive process. Experimental results show our proposed methods especially the second one (bidirectional decoder regularization), leads a significantly improvement on both robustness and overall naturalness, as outperforming baseline (the revised version of Tacotron2) with a MOS gap of 0.14 in a challenging test, and achieving close to human quality (4.42 vs. 4.49 in MOS) on general test.

Comments:	Accepted by INTERSPEECH2019. arXiv admin note: text overlap with arXiv:1808.04064, arXiv:1804.05374 by other authors
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:1907.09006 [eess.AS]
	(or arXiv:1907.09006v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.1907.09006

Submission history

From: Yibin Zheng [view email]
[v1] Thu, 18 Jul 2019 12:24:30 UTC (3,418 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Forward-Backward Decoding for Regularizing End-to-End TTS

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Forward-Backward Decoding for Regularizing End-to-End TTS

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators