Exploring Representation Learning for Small-Footprint Keyword Spotting

Cui, Fan; Guo, Liyong; Wang, Quandong; Gao, Peng; Wang, Yujun

doi:10.21437/Interspeech.2022-10558

Computer Science > Sound

arXiv:2303.10912 (cs)

[Submitted on 20 Mar 2023]

Title:Exploring Representation Learning for Small-Footprint Keyword Spotting

Authors:Fan Cui, Liyong Guo, Quandong Wang, Peng Gao, Yujun Wang

View PDF

Abstract:In this paper, we investigate representation learning for low-resource keyword spotting (KWS). The main challenges of KWS are limited labeled data and limited available device resources. To address those challenges, we explore representation learning for KWS by self-supervised contrastive learning and self-training with pretrained model. First, local-global contrastive siamese networks (LGCSiam) are designed to learn similar utterance-level representations for similar audio samplers by proposed local-global contrastive loss without requiring ground-truth. Second, a self-supervised pretrained Wav2Vec 2.0 model is applied as a constraint module (WVC) to force the KWS model to learn frame-level acoustic representations. By the LGCSiam and WVC modules, the proposed small-footprint KWS model can be pretrained with unlabeled data. Experiments on speech commands dataset show that the self-training WVC module and the self-supervised LGCSiam module significantly improve accuracy, especially in the case of training on a small labeled dataset.

Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2303.10912 [cs.SD]
	(or arXiv:2303.10912v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2303.10912
Related DOI:	https://doi.org/10.21437/Interspeech.2022-10558

Submission history

From: Fan Cui [view email]
[v1] Mon, 20 Mar 2023 07:09:26 UTC (968 KB)

Computer Science > Sound

Title:Exploring Representation Learning for Small-Footprint Keyword Spotting

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Exploring Representation Learning for Small-Footprint Keyword Spotting

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators