FUN! Fast, Universal, Non-Semantic Speech Embeddings

Peplinski, Jacob; Shor, Joel; Joglekar, Sachin; Garrison, Jake; Patel, Shwetak

Computer Science > Sound

arXiv:2011.04609v1 (cs)

[Submitted on 9 Nov 2020 (this version), latest version 10 Jun 2021 (v5)]

Title:FUN! Fast, Universal, Non-Semantic Speech Embeddings

Authors:Jacob Peplinski, Joel Shor, Sachin Joglekar, Jake Garrison, Shwetak Patel

View PDF

Abstract:Learned speech representations can drastically improve performance on tasks with limited labeled data. However, due to their size and complexity, learned representations have limited utility in mobile settings where run-time performance is a significant bottleneck. We propose a class of lightweight universal speech embedding models based on MobileNet that are designed to run efficiently on mobile devices. These embeddings, which encapsulate speech non-semantics and thus can be re-used for several tasks, are trained via knowledge distillation. We show that these embedding models are fast enough to run in real-time on a variety of mobile devices and exhibit negligible performance degradation on most tasks in a recently published benchmark of non-semantic speech tasks. Furthermore, we demonstrate that these representations are useful for mobile health tasks such as mask detection during speech and non-speech human sounds detection.

Comments:	ICASSP2021 Submission, 5 Pages
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2011.04609 [cs.SD]
	(or arXiv:2011.04609v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2011.04609

Submission history

From: Jacob Peplinski [view email]
[v1] Mon, 9 Nov 2020 18:07:06 UTC (286 KB)
[v2] Tue, 6 Apr 2021 06:01:14 UTC (72 KB)
[v3] Sat, 1 May 2021 04:57:34 UTC (72 KB)
[v4] Wed, 19 May 2021 23:30:10 UTC (80 KB)
[v5] Thu, 10 Jun 2021 16:18:35 UTC (72 KB)

Computer Science > Sound

Title:FUN! Fast, Universal, Non-Semantic Speech Embeddings

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:FUN! Fast, Universal, Non-Semantic Speech Embeddings

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators