On the Relation Between Speech Quality and Quantized Latent Representations of Neural Codecs

Halimeh, Mhd Modar; Torcoli, Matteo; Grundhuber, Philipp; Habets, Emanuël A. P.

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2503.03304 (eess)

[Submitted on 5 Mar 2025]

Title:On the Relation Between Speech Quality and Quantized Latent Representations of Neural Codecs

Authors:Mhd Modar Halimeh, Matteo Torcoli, Philipp Grundhuber, Emanuël A. P. Habets

View PDF

Abstract:Neural audio signal codecs have attracted significant attention in recent years. In essence, the impressive low bitrate achieved by such encoders is enabled by learning an abstract representation that captures the properties of encoded signals, e.g., speech. In this work, we investigate the relation between the latent representation of the input signal learned by a neural codec and the quality of speech signals. To do so, we introduce Latent-representation-to-Quantization error Ratio (LQR) measures, which quantify the distance from the idealized neural codec's speech signal model for a given speech signal. We compare the proposed metrics to intrusive measures as well as data-driven supervised methods using two subjective speech quality datasets. This analysis shows that the proposed LQR correlates strongly (up to 0.9 Pearson's correlation) with the subjective quality of speech. Despite being a non-intrusive metric, this yields a competitive performance with, or even better than, other pre-trained and intrusive measures. These results show that LQR is a promising basis for more sophisticated speech quality measures.

Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2503.03304 [eess.AS]
	(or arXiv:2503.03304v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2503.03304

Submission history

From: Mhd Modar Halimeh [view email]
[v1] Wed, 5 Mar 2025 09:37:14 UTC (98 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:On the Relation Between Speech Quality and Quantized Latent Representations of Neural Codecs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:On the Relation Between Speech Quality and Quantized Latent Representations of Neural Codecs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators