VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech

Gudmalwar, Ashishkumar; Shah, Nirmesh; Akarsh, Sai; Wasnik, Pankaj; Shah, Rajiv Ratn

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2406.08076 (eess)

[Submitted on 12 Jun 2024]

Title:VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech

Authors:Ashishkumar Gudmalwar, Nirmesh Shah, Sai Akarsh, Pankaj Wasnik, Rajiv Ratn Shah

View PDF HTML (experimental)

Abstract:Despite the significant advancements in Text-to-Speech (TTS) systems, their full utilization in automatic dubbing remains limited. This task necessitates the extraction of voice identity and emotional style from a reference speech in a source language and subsequently transferring them to a target language using cross-lingual TTS techniques. While previous approaches have mainly concentrated on controlling voice identity within the cross-lingual TTS framework, there has been limited work on incorporating emotion and voice identity together. To this end, we introduce an end-to-end Voice Identity and Emotional Style Controllable Cross-Lingual (VECL) TTS system using multilingual speakers and an emotion embedding network. Moreover, we introduce content and style consistency losses to enhance the quality of synthesized speech further. The proposed system achieved an average relative improvement of 8.83\% compared to the state-of-the-art (SOTA) methods on a database comprising English and three Indian languages (Hindi, Telugu, and Marathi).

Comments:	Accepted at INTERSPEECH 2024
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2406.08076 [eess.AS]
	(or arXiv:2406.08076v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2406.08076

Submission history

From: Nirmesh Shah Dr [view email]
[v1] Wed, 12 Jun 2024 10:51:29 UTC (6,683 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators