Bifocal Neural ASR: Exploiting Keyword Spotting for Inference Optimization

Macoskey, Jonathan; Strimel, Grant P.; Rastrow, Ariya

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2108.01704 (eess)

[Submitted on 3 Aug 2021]

Title:Bifocal Neural ASR: Exploiting Keyword Spotting for Inference Optimization

Authors:Jonathan Macoskey, Grant P. Strimel, Ariya Rastrow

View PDF

Abstract:We present Bifocal RNN-T, a new variant of the Recurrent Neural Network Transducer (RNN-T) architecture designed for improved inference time latency on speech recognition tasks. The architecture enables a dynamic pivot for its runtime compute pathway, namely taking advantage of keyword spotting to select which component of the network to execute for a given audio frame. To accomplish this, we leverage a recurrent cell we call the Bifocal LSTM (BFLSTM), which we detail in the paper. The architecture is compatible with other optimization strategies such as quantization, sparsification, and applying time-reduction layers, making it especially applicable for deployed, real-time speech recognition settings. We present the architecture and report comparative experimental results on voice-assistant speech recognition tasks. Specifically, we show our proposed Bifocal RNN-T can improve inference cost by 29.1% with matching word error rates and only a minor increase in memory size.

Comments:	Accepted at ICASSP 2021
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2108.01704 [eess.AS]
	(or arXiv:2108.01704v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2108.01704

Submission history

From: Jonathan Macoskey [view email]
[v1] Tue, 3 Aug 2021 18:58:39 UTC (266 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Bifocal Neural ASR: Exploiting Keyword Spotting for Inference Optimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Bifocal Neural ASR: Exploiting Keyword Spotting for Inference Optimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators