LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis

Tan, Zhenxiong; Ma, Xinyin; Fang, Gongfan; Wang, Xinchao

Computer Science > Sound

arXiv:2407.10468 (cs)

[Submitted on 15 Jul 2024]

Title:LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis

Authors:Zhenxiong Tan, Xinyin Ma, Gongfan Fang, Xinchao Wang

View PDF HTML (experimental)

Abstract:Latent diffusion models have shown promising results in audio generation, making notable advancements over traditional methods. However, their performance, while impressive with short audio clips, faces challenges when extended to longer audio sequences. These challenges are due to model's self-attention mechanism and training predominantly on 10-second clips, which complicates the extension to longer audio without adaptation. In response to these issues, we introduce a novel approach, LiteFocus that enhances the inference of existing audio latent diffusion models in long audio synthesis. Observed the attention pattern in self-attention, we employ a dual sparse form for attention calculation, designated as same-frequency focus and cross-frequency compensation, which curtails the attention computation under same-frequency constraints, while enhancing audio quality through cross-frequency refillment. LiteFocus demonstrates substantial reduction on inference time with diffusion-based TTA model by 1.99x in synthesizing 80-second audio clips while also obtaining improved audio quality.

Comments:	Interspeech 2024; Code: this https URL
Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2407.10468 [cs.SD]
	(or arXiv:2407.10468v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2407.10468

Submission history

From: Zhenxiong Tan [view email]
[v1] Mon, 15 Jul 2024 06:49:05 UTC (6,855 KB)

Computer Science > Sound

Title:LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators