Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention

Bollepalli, Bajibabu; Juvela, Lauri; Alku, Paavo

Computer Science > Sound

arXiv:1810.12051 (cs)

[Submitted on 29 Oct 2018]

Title:Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention

Authors:Bajibabu Bollepalli, Lauri Juvela, Paavo Alku

View PDF

Abstract:Currently, there are increasing interests in text-to-speech (TTS) synthesis to use sequence-to-sequence models with attention. These models are end-to-end meaning that they learn both co-articulation and duration properties directly from text and speech. Since these models are entirely data-driven, they need large amounts of data to generate synthetic speech with good quality. However, in challenging speaking styles, such as Lombard speech, it is difficult to record sufficiently large speech corpora. Therefore, in this study we propose a transfer learning method to adapt a sequence-to-sequence based TTS system of normal speaking style to Lombard style. Moreover, we experiment with a WaveNet vocoder in synthesis of Lombard speech. We conducted subjective evaluations to assess the performance of the adapted TTS systems. The subjective evaluation results indicated that an adaptation system with the WaveNet vocoder clearly outperformed the conventional deep neural network based TTS system in synthesis of Lombard speech.

Comments:	5 pages, 5 figures. Submitted to ICASSP 2019
Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1810.12051 [cs.SD]
	(or arXiv:1810.12051v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.1810.12051

Submission history

From: Bajibabu Bollepalli Mr [view email]
[v1] Mon, 29 Oct 2018 10:53:31 UTC (214 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.SD

< prev | next >

new | recent | 2018-10

Change to browse by:

cs
cs.CL
eess
eess.AS

References & Citations

DBLP - CS Bibliography

listing | bibtex

Bajibabu Bollepalli
Lauri Juvela
Paavo Alku

export BibTeX citation

Computer Science > Sound

Title:Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators