Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning

Huang, Sung-Feng; Chen, Chia-ping; Chen, Zhi-Sheng; Tsai, Yu-Pao; Lee, Hung-yi

Computer Science > Sound

arXiv:2303.11816 (cs)

[Submitted on 21 Mar 2023]

Title:Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning

Authors:Sung-Feng Huang, Chia-ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-yi Lee

View PDF

Abstract:Personalized TTS is an exciting and highly desired application that allows users to train their TTS voice using only a few recordings. However, TTS training typically requires many hours of recording and a large model, making it unsuitable for deployment on mobile devices. To overcome this limitation, related works typically require fine-tuning a pre-trained TTS model to preserve its ability to generate high-quality audio samples while adapting to the target speaker's voice. This process is commonly referred to as ``voice cloning.'' Although related works have achieved significant success in changing the TTS model's voice, they are still required to fine-tune from a large pre-trained model, resulting in a significant size for the voice-cloned model. In this paper, we propose applying trainable structured pruning to voice cloning. By training the structured pruning masks with voice-cloning data, we can produce a unique pruned model for each target speaker. Our experiments demonstrate that using learnable structured pruning, we can compress the model size to 7 times smaller while achieving comparable voice-cloning performance.

Comments:	ICASSP 2023
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2303.11816 [cs.SD]
	(or arXiv:2303.11816v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2303.11816

Submission history

From: Sung-Feng Huang [view email]
[v1] Tue, 21 Mar 2023 12:59:46 UTC (153 KB)

Computer Science > Sound

Title:Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators