TRILLsson: Distilled Universal Paralinguistic Speech Representations

Shor, Joel; Venugopalan, Subhashini

doi:10.21437/Interspeech.2022-118

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2203.00236 (eess)

[Submitted on 1 Mar 2022 (v1), last revised 20 Mar 2022 (this version, v2)]

Title:TRILLsson: Distilled Universal Paralinguistic Speech Representations

Authors:Joel Shor, Subhashini Venugopalan

View PDF

Abstract:Recent advances in self-supervision have dramatically improved the quality of speech representations. However, deployment of state-of-the-art embedding models on devices has been restricted due to their limited public availability and large resource footprint. Our work addresses these issues by publicly releasing a collection of paralinguistic speech models that are small and near state-of-the-art performance. Our approach is based on knowledge distillation, and our models are distilled on public data only. We explore different architectures and thoroughly evaluate our models on the Non-Semantic Speech (NOSS) benchmark. Our largest distilled model is less than 15% the size of the original model (314MB vs 2.2GB), achieves over 96% the accuracy on 6 of 7 tasks, and is trained on 6.5% the data. The smallest model is 1% in size (22MB) and achieves over 90% the accuracy on 6 of 7 tasks. Our models outperform the open source Wav2Vec 2.0 model on 6 of 7 tasks, and our smallest model outperforms the open source Wav2Vec 2.0 on both emotion recognition tasks despite being 7% the size.

Comments:	Submitted to Interspeech 2022
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2203.00236 [eess.AS]
	(or arXiv:2203.00236v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2203.00236
Journal reference:	Proc. Interspeech 2022, 356-360
Related DOI:	https://doi.org/10.21437/Interspeech.2022-118

Submission history

From: Joel Shor [view email]
[v1] Tue, 1 Mar 2022 05:22:57 UTC (670 KB)
[v2] Sun, 20 Mar 2022 21:13:37 UTC (670 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:TRILLsson: Distilled Universal Paralinguistic Speech Representations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:TRILLsson: Distilled Universal Paralinguistic Speech Representations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators