Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network

Hwang, Min-Jae; Song, Eunwoo; Yamamoto, Ryuichi; Soong, Frank; Kang, Hong-Goo

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2001.11686 (eess)

[Submitted on 31 Jan 2020]

Title:Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network

Authors:Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang

View PDF

Abstract:In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN). The recently proposed LPCNet vocoder has successfully achieved high-quality and lightweight speech synthesis systems by combining a vocal tract LP filter with a WaveRNN-based vocal source (i.e., excitation) generator. However, the quality of synthesized speech is often unstable because the vocal source component is insufficiently represented by the mu-law quantization method, and the model is trained without considering the entire speech production mechanism. To address this problem, we first introduce LP-MDN, which enables the autoregressive neural vocoder to structurally represent the interactions between the vocal tract and vocal source components. Then, we propose to incorporate the LP-MDN to the LPCNet vocoder by replacing the conventional discretized output with continuous density distribution. The experimental results verify that the proposed system provides high quality synthetic speech by achieving a mean opinion score of 4.41 within a text-to-speech framework.

Comments:	Accepted to ICASSP 2020
Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2001.11686 [eess.AS]
	(or arXiv:2001.11686v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2001.11686
Journal reference:	IEEE ICASSP 2020

Submission history

From: Min-Jae Hwang [view email]
[v1] Fri, 31 Jan 2020 07:43:01 UTC (79 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators