An ASR Guided Speech Intelligibility Measure for TTS Model Selection

Baby, Arun; Vinnaitherthan, Saranya; Adiga, Nagaraj; Jawale, Pranav; Badam, Sumukh; Adavanne, Sharath; Konjeti, Srikanth

Computer Science > Sound

arXiv:2006.01463 (cs)

[Submitted on 2 Jun 2020]

Title:An ASR Guided Speech Intelligibility Measure for TTS Model Selection

Authors:Arun Baby, Saranya Vinnaitherthan, Nagaraj Adiga, Pranav Jawale, Sumukh Badam, Sharath Adavanne, Srikanth Konjeti

View PDF

Abstract:The perceptual quality of neural text-to-speech (TTS) is highly dependent on the choice of the model during training. Selecting the model using a training-objective metric such as the least mean squared error does not always correlate with human perception. In this paper, we propose an objective metric based on the phone error rate (PER) to select the TTS model with the best speech intelligibility. The PER is computed between the input text to the TTS model, and the text decoded from the synthesized speech using an automatic speech recognition (ASR) model, which is trained on the same data as the TTS model. With the help of subjective studies, we show that the TTS model chosen with the least PER on validation split has significantly higher speech intelligibility compared to the model with the least training-objective metric loss. Finally, using the proposed PER and subjective evaluation, we show that the choice of best TTS model depends on the genre of the target domain text. All our experiments are conducted on a Hindi language dataset. However, the proposed model selection method is language independent.

Comments:	Submitted to INTERSPEECH 2020
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2006.01463 [cs.SD]
	(or arXiv:2006.01463v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2006.01463

Submission history

From: Arun Baby [view email]
[v1] Tue, 2 Jun 2020 09:06:41 UTC (1,518 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.SD

< prev | next >

new | recent | 2020-06

Change to browse by:

cs
eess
eess.AS

References & Citations

DBLP - CS Bibliography

listing | bibtex

Sharath Adavanne

export BibTeX citation

Computer Science > Sound

Title:An ASR Guided Speech Intelligibility Measure for TTS Model Selection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:An ASR Guided Speech Intelligibility Measure for TTS Model Selection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators