SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models

Tang, Yuxun; Wu, Yuning; Shi, Jiatong; Jin, Qin

Computer Science > Sound

arXiv:2406.08905 (cs)

[Submitted on 13 Jun 2024 (v1), last revised 20 Jun 2024 (this version, v2)]

Title:SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models

Authors:Yuxun Tang, Yuning Wu, Jiatong Shi, Qin Jin

View PDF HTML (experimental)

Abstract:Discrete representation has shown advantages in speech generation tasks, wherein discrete tokens are derived by discretizing hidden features from self-supervised learning (SSL) pre-trained models. However, the direct application of speech SSL models to singing generation encounters domain gaps between speech and singing. Furthermore, singing generation necessitates a more refined representation than typical speech. To address these challenges, we introduce SingOMD, a novel method to extract singing-oriented multi-resolution discrete representations from speech SSL models. Specifically, we first adapt the features from speech SSL through a resynthesis task and incorporate multi-resolution modules based on resampling to better serve singing generation. These adapted multi-resolution features are then discretized via clustering. Extensive experiments demonstrate the robustness, efficiency, and effectiveness of these representations in singing vocoders and singing voice synthesis.

Comments:	Accepted by Interspeech 2024
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2406.08905 [cs.SD]
	(or arXiv:2406.08905v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2406.08905

Submission history

From: Yuxun Tang [view email]
[v1] Thu, 13 Jun 2024 08:00:25 UTC (124 KB)
[v2] Thu, 20 Jun 2024 11:01:14 UTC (124 KB)

Computer Science > Sound

Title:SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators