Automatic context window composition for distant speech recognition

Ravanelli, Mirco; Omologo, Maurizio

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:1805.10498 (eess)

[Submitted on 26 May 2018]

Title:Automatic context window composition for distant speech recognition

Authors:Mirco Ravanelli, Maurizio Omologo

View PDF

Abstract:Distant speech recognition is being revolutionized by deep learning, that has contributed to significantly outperform previous HMM-GMM systems. A key aspect behind the rapid rise and success of DNNs is their ability to better manage large time contexts. With this regard, asymmetric context windows that embed more past than future frames have been recently used with feed-forward neural networks. This context configuration turns out to be useful not only to address low-latency speech recognition, but also to boost the recognition performance under reverberant conditions. This paper investigates on the mechanisms occurring inside DNNs, which lead to an effective application of asymmetric this http URL particular, we propose a novel method for automatic context window composition based on a gradient analysis. The experiments, performed with different acoustic environments, features, DNN architectures, microphone settings, and recognition tasks show that our simple and efficient strategy leads to a less redundant frame configuration, which makes DNN training more effective in reverberant scenarios.

Comments:	This is a preprint version of the paper published on Speech Communication Journal, 2018. Please see this https URL for the published version of this article
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)
Cite as:	arXiv:1805.10498 [eess.AS]
	(or arXiv:1805.10498v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.1805.10498

Submission history

From: Mirco Ravanelli [view email]
[v1] Sat, 26 May 2018 15:36:44 UTC (2,169 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Automatic context window composition for distant speech recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Automatic context window composition for distant speech recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators