Multi-Modal Data Augmentation for End-to-end ASR

Renduchintala, Adithya; Ding, Shuoyang; Wiesner, Matthew; Watanabe, Shinji

Computer Science > Computation and Language

arXiv:1803.10299v1 (cs)

[Submitted on 27 Mar 2018 (this version), latest version 18 Jun 2018 (v3)]

Title:Multi-Modal Data Augmentation for End-to-end ASR

Authors:Adithya Renduchintala, Shuoyang Ding, Matthew Wiesner, Shinji Watanabe

View PDF

Abstract:We present a new end-to-end architecture for automatic speech recognition (ASR) that can be trained using \emph{symbolic} input in addition to the traditional acoustic input. This architecture utilizes two separate encoders: one for acoustic input and another for symbolic input, both sharing the attention and decoder parameters. We call this architecture a multi-modal data augmentation network (MMDA), as it can support multi-modal (acoustic and symbolic) input. The MMDA architecture attempts to eliminate the need for an external LM, by enabling seamless mixing of large text datasets with significantly smaller transcribed speech corpora during training. We study different ways of transforming large text corpora into a symbolic form suitable for training our MMDA network. Our best MMDA setup obtains small improvements on CER and achieves 8-10\% relative WER improvement on the WSJ data set.

Comments:	5 Pages, 1 Figure, Submitted to INTERSPEECH 2018
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1803.10299 [cs.CL]
	(or arXiv:1803.10299v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1803.10299

Submission history

From: Adi Renduchintala [view email]
[v1] Tue, 27 Mar 2018 20:12:39 UTC (225 KB)
[v2] Fri, 30 Mar 2018 00:39:23 UTC (147 KB)
[v3] Mon, 18 Jun 2018 05:53:10 UTC (153 KB)

Computer Science > Computation and Language

Title:Multi-Modal Data Augmentation for End-to-end ASR

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Multi-Modal Data Augmentation for End-to-end ASR

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators