Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances

Keung, Phillip; Niu, Wei; Lu, Yichao; Salazar, Julian; Bhardwaj, Vikas

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2002.05150 (eess)

[Submitted on 12 Feb 2020]

Title:Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances

Authors:Phillip Keung, Wei Niu, Yichao Lu, Julian Salazar, Vikas Bhardwaj

View PDF

Abstract:We discuss the problem of echographic transcription in autoregressive sequence-to-sequence attentional architectures for automatic speech recognition, where a model produces very long sequences of repetitive outputs when presented with out-of-domain utterances. We decode audio from the British National Corpus with an attentional encoder-decoder model trained solely on the LibriSpeech corpus. We observe that there are many 5-second recordings that produce more than 500 characters of decoding output (i.e. more than 100 characters per second). A frame-synchronous hybrid (DNN-HMM) model trained on the same data does not produce these unusually long transcripts. These decoding issues are reproducible in a speech transformer model from ESPnet, and to a lesser extent in a self-attention CTC model, suggesting that these issues are intrinsic to the use of the attention mechanism. We create a separate length prediction model to predict the correct number of wordpieces in the output, which allows us to identify and truncate problematic decoding results without increasing word error rates on the LibriSpeech task.

Comments:	Artifacts like our filtered Audio BNC dataset can be found at this https URL
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2002.05150 [eess.AS]
	(or arXiv:2002.05150v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2002.05150

Submission history

From: Julian Salazar [view email]
[v1] Wed, 12 Feb 2020 18:53:56 UTC (441 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators