Voice Conversion with Denoising Diffusion Probabilistic GAN Models

Zhang, Xulong; Wang, Jianzong; Cheng, Ning; Xiao, Jing

Computer Science > Sound

arXiv:2308.14319 (cs)

[Submitted on 28 Aug 2023]

Title:Voice Conversion with Denoising Diffusion Probabilistic GAN Models

Authors:Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

View PDF

Abstract:Voice conversion is a method that allows for the transformation of speaking style while maintaining the integrity of linguistic information. There are many researchers using deep generative models for voice conversion tasks. Generative Adversarial Networks (GANs) can quickly generate high-quality samples, but the generated samples lack diversity. The samples generated by the Denoising Diffusion Probabilistic Models (DDPMs) are better than GANs in terms of mode coverage and sample diversity. But the DDPMs have high computational costs and the inference speed is slower than GANs. In order to make GANs and DDPMs more practical we proposes DiffGAN-VC, a variant of GANs and DDPMS, to achieve non-parallel many-to-many voice conversion (VC). We use large steps to achieve denoising, and also introduce a multimodal conditional GANs to model the denoising diffusion generative adversarial network. According to both objective and subjective evaluation experiments, DiffGAN-VC has been shown to achieve high voice quality on non-parallel data sets. Compared with the CycleGAN-VC method, DiffGAN-VC achieves speaker similarity, naturalness and higher sound quality.

Comments:	Accepted by 19th International Conference on Advanced Data Mining and Applications. (ADMA 2023)
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2308.14319 [cs.SD]
	(or arXiv:2308.14319v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2308.14319

Submission history

From: Xulong Zhang [view email]
[v1] Mon, 28 Aug 2023 05:53:06 UTC (3,212 KB)

Computer Science > Sound

Title:Voice Conversion with Denoising Diffusion Probabilistic GAN Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Voice Conversion with Denoising Diffusion Probabilistic GAN Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators