Differentiable Time-Varying Linear Prediction in the Context of End-to-End Analysis-by-Synthesis

Yu, Chin-Yun; Fazekas, György

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2406.05128 (eess)

[Submitted on 7 Jun 2024 (v1), last revised 18 Jun 2024 (this version, v2)]

Title:Differentiable Time-Varying Linear Prediction in the Context of End-to-End Analysis-by-Synthesis

Authors:Chin-Yun Yu, György Fazekas

View PDF

Abstract:Training the linear prediction (LP) operator end-to-end for audio synthesis in modern deep learning frameworks is slow due to its recursive formulation. In addition, frame-wise approximation as an acceleration method cannot generalise well to test time conditions where the LP is computed sample-wise. Efficient differentiable sample-wise LP for end-to-end training is the key to removing this barrier. We generalise the efficient time-invariant LP implementation from the GOLF vocoder to time-varying cases. Combining this with the classic source-filter model, we show that the improved GOLF learns LP coefficients and reconstructs the voice better than its frame-wise counterparts. Moreover, in our listening test, synthesised outputs from GOLF scored higher in quality ratings than the state-of-the-art differentiable WORLD vocoder.

Comments:	Accepted at Interspeech 2024
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2406.05128 [eess.AS]
	(or arXiv:2406.05128v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2406.05128

Submission history

From: Chin-Yun Yu [view email]
[v1] Fri, 7 Jun 2024 17:57:29 UTC (2,291 KB)
[v2] Tue, 18 Jun 2024 21:01:21 UTC (2,291 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Differentiable Time-Varying Linear Prediction in the Context of End-to-End Analysis-by-Synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Differentiable Time-Varying Linear Prediction in the Context of End-to-End Analysis-by-Synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators