Ultra-Low-Bitrate Speech Coding with Pretrained Transformers

Siahkoohi, Ali; Chinen, Michael; Denton, Tom; Kleijn, W. Bastiaan; Skoglund, Jan

Computer Science > Sound

arXiv:2207.02262 (cs)

[Submitted on 5 Jul 2022]

Title:Ultra-Low-Bitrate Speech Coding with Pretrained Transformers

Authors:Ali Siahkoohi, Michael Chinen, Tom Denton, W. Bastiaan Kleijn, Jan Skoglund

View PDF

Abstract:Speech coding facilitates the transmission of speech over low-bandwidth networks with minimal distortion. Neural-network based speech codecs have recently demonstrated significant improvements in quality over traditional approaches. While this new generation of codecs is capable of synthesizing high-fidelity speech, their use of recurrent or convolutional layers often restricts their effective receptive fields, which prevents them from compressing speech efficiently. We propose to further reduce the bitrate of neural speech codecs through the use of pretrained Transformers, capable of exploiting long-range dependencies in the input signal due to their inductive bias. As such, we use a pretrained Transformer in tandem with a convolutional encoder, which is trained end-to-end with a quantizer and a generative adversarial net decoder. Our numerical experiments show that supplementing the convolutional encoder of a neural speech codec with Transformer speech embeddings yields a speech codec with a bitrate of $600\,\mathrm{bps}$ that outperforms the original neural speech codec in synthesized speech quality when trained at the same bitrate. Subjective human evaluations suggest that the quality of the resulting codec is comparable or better than that of conventional codecs operating at three to four times the rate.

Comments:	Proceedings of INTERSPEECH 2022
Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2207.02262 [cs.SD]
	(or arXiv:2207.02262v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2207.02262

Submission history

From: Ali Siahkoohi [view email]
[v1] Tue, 5 Jul 2022 18:52:11 UTC (1,733 KB)

Computer Science > Sound

Title:Ultra-Low-Bitrate Speech Coding with Pretrained Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Ultra-Low-Bitrate Speech Coding with Pretrained Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators