Deep Sparse Conformer for Speech Recognition

Wu, Xianchao

Computer Science > Computation and Language

arXiv:2209.00260 (cs)

[Submitted on 1 Sep 2022]

Title:Deep Sparse Conformer for Speech Recognition

Authors:Xianchao Wu

View PDF

Abstract:Conformer has achieved impressive results in Automatic Speech Recognition (ASR) by leveraging transformer's capturing of content-based global interactions and convolutional neural network's exploiting of local features. In Conformer, two macaron-like feed-forward layers with half-step residual connections sandwich the multi-head self-attention and convolution modules followed by a post layer normalization. We improve Conformer's long-sequence representation ability in two directions, \emph{sparser} and \emph{deeper}. We adapt a sparse self-attention mechanism with $\mathcal{O}(L\text{log}L)$ in time complexity and memory usage. A deep normalization strategy is utilized when performing residual connections to ensure our training of hundred-level Conformer blocks. On the Japanese CSJ-500h dataset, this deep sparse Conformer achieves respectively CERs of 5.52\%, 4.03\% and 4.50\% on the three evaluation sets and 4.16\%, 2.84\% and 3.20\% when ensembling five deep sparse Conformer variants from 12 to 16, 17, 50, and finally 100 encoder layers.

Comments:	5 pages, 1 figure
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2209.00260 [cs.CL]
	(or arXiv:2209.00260v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2209.00260

Submission history

From: Xianchao Wu [view email]
[v1] Thu, 1 Sep 2022 06:56:11 UTC (611 KB)

Computer Science > Computation and Language

Title:Deep Sparse Conformer for Speech Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Deep Sparse Conformer for Speech Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators