Leveraging cache to enable SLU on tiny devices

Benazir, Afsara; Xu, Zhiming; Lin, Felix Xiaozhu

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2311.18188v1 (eess)

[Submitted on 30 Nov 2023 (this version), latest version 8 May 2024 (v4)]

Title:Leveraging cache to enable SLU on tiny devices

Authors:Afsara Benazir, Zhiming Xu, Felix Xiaozhu Lin (University of Virginia)

View PDF

Abstract:This paper addresses spoken language understanding (SLU) on microcontroller-like embedded devices, integrating on-device execution with cloud offloading in a novel fashion. We exploit temporal locality in a device's speech inputs and accordingly reuse recent SLU inferences. Our idea is simple: let the device match new inputs against cached results, and only offload unmatched inputs to the cloud for full inference. Realization of this idea, however, is non-trivial: the device needs to compare acoustic features in a robust, low-cost way. To this end, we present XYZ, a speech cache for tiny devices. It matches speech inputs at two levels of representations: first by clustered sequences of raw sound units, then as sequences of phonemes. Working in tandem, the two representations offer complementary cost/accuracy tradeoffs. To further boost accuracy, our cache is learning: with the mismatched and then offloaded inputs, it continuously finetunes the device's feature extractors (with the assistance of the cloud). We implement XYZ on an off-the-shelf STM32 microcontroller. The resultant implementation has a small memory footprint of 2MB. Evaluated on challenging speech benchmarks, our system resolves 45%--90% of inputs on device, reducing the average latency by up to 80% compared to offloading to popular cloud speech services. Our benefit is pronounced even in adversarial settings -- noisy environments, cold cache, or one device shared by a number of users.

Comments:	submitted to Mobisys 2024
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
Cite as:	arXiv:2311.18188 [eess.AS]
	(or arXiv:2311.18188v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2311.18188

Submission history

From: Afsara Benazir [view email]
[v1] Thu, 30 Nov 2023 02:15:07 UTC (6,674 KB)
[v2] Mon, 11 Dec 2023 05:28:16 UTC (16,197 KB)
[v3] Wed, 13 Dec 2023 01:33:23 UTC (120,179 KB)
[v4] Wed, 8 May 2024 17:08:52 UTC (7,837 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Leveraging cache to enable SLU on tiny devices

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Leveraging cache to enable SLU on tiny devices

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators