Electrical Engineering and Systems Science > Audio and Speech Processing
[Submitted on 23 Sep 2022 (this version), latest version 11 Jan 2024 (v5)]
Title:ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Rhythm
View PDFAbstract:Recent developments in neural speech synthesis and vocoding have sparked a renewed interest in voice conversion (VC). Beyond timbre transfer, achieving controllability on para-linguistic parameters such as pitch and rhythm is critical in deploying VC systems in many application scenarios. Existing studies, however, either only provide utterance-level global control or lack interpretability on the controls. In this paper, we propose ControlVC, the first neural voice conversion system that achieves time-varying controls on pitch and rhythm. ControlVC uses pre-trained encoders to compute pitch embeddings and linguistic embeddings from the source utterance and speaker embeddings from the target utterance. These embeddings are then concatenated and converted to speech using a vocoder. It achieves rhythm control through TD-PSOLA pre-processing on the source utterance, and achieves pitch control by manipulating the pitch contour before feeding it to the pitch encoder. Systematic subjective and objective evaluations are conducted to assess the speech quality and controllability. Results show that, on non-parallel and zero-shot conversion tasks, ControlVC significantly outperforms two other self-constructed baselines on speech quality, and it can successfully achieve time-varying pitch control.
Submission history
From: Meiying Chen [view email][v1] Fri, 23 Sep 2022 21:28:59 UTC (1,746 KB)
[v2] Wed, 7 Dec 2022 04:08:10 UTC (1,843 KB)
[v3] Tue, 24 Jan 2023 07:54:55 UTC (1,844 KB)
[v4] Sat, 3 Jun 2023 19:03:28 UTC (132 KB)
[v5] Thu, 11 Jan 2024 19:45:39 UTC (132 KB)
Current browse context:
eess.AS
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.