Time-Graph Frequency Representation with Singular Value Decomposition for Neural Speech Enhancement

Wang, Tingting; Wang, Tianrui; Ge, Meng; Zhang, Qiquan; Ge, Zirui; Yang, Zhen

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2412.16823 (eess)

[Submitted on 22 Dec 2024 (v1), last revised 24 Dec 2024 (this version, v2)]

Title:Time-Graph Frequency Representation with Singular Value Decomposition for Neural Speech Enhancement

Authors:Tingting Wang, Tianrui Wang, Meng Ge, Qiquan Zhang, Zirui Ge, Zhen Yang

View PDF HTML (experimental)

Abstract:Time-frequency (T-F) domain methods for monaural speech enhancement have benefited from the success of deep learning. Recently, focus has been put on designing two-stream network models to predict amplitude mask and phase separately, or, coupling the amplitude and phase into Cartesian coordinates and constructing real and imaginary pairs. However, most methods suffer from the alignment modeling of amplitude and phase (real and imaginary pairs) in a two-stream network framework, which inevitably incurs performance restrictions. In this paper, we introduce a graph Fourier transform defined with the singular value decomposition (GFT-SVD), resulting in real-valued time-graph representation for neural speech enhancement. This real-valued representation-based GFT-SVD provides an ability to align the modeling of amplitude and phase, leading to avoiding recovering the target speech phase information. Our findings demonstrate the effects of real-valued time-graph representation based on GFT-SVD for neutral speech enhancement. The extensive speech enhancement experiments establish that the combination of GFT-SVD and DNN outperforms the combination of GFT with the eigenvector decomposition (GFT-EVD) and magnitude estimation UNet, and outperforms the short-time Fourier transform (STFT) and DNN, regarding objective intelligibility and perceptual quality. We release our source code at: this https URL\_project.

Comments:	5 pages, 4 figures, Accepted by ICASSP2025
Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2412.16823 [eess.AS]
	(or arXiv:2412.16823v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2412.16823

Submission history

From: Zirui Ge [view email]
[v1] Sun, 22 Dec 2024 02:05:21 UTC (9,185 KB)
[v2] Tue, 24 Dec 2024 05:36:12 UTC (9,185 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Time-Graph Frequency Representation with Singular Value Decomposition for Neural Speech Enhancement

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Time-Graph Frequency Representation with Singular Value Decomposition for Neural Speech Enhancement

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators