Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition

Gao, Dongji; Xu, Hainan; Raj, Desh; Perera, Leibny Paola Garcia; Povey, Daniel; Khudanpur, Sanjeev

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2309.15796 (eess)

[Submitted on 26 Sep 2023]

Title:Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition

Authors:Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola Garcia Perera, Daniel Povey, Sanjeev Khudanpur

View PDF

Abstract:Training automatic speech recognition (ASR) systems requires large amounts of well-curated paired data. However, human annotators usually perform "non-verbatim" transcription, which can result in poorly trained models. In this paper, we propose Omni-temporal Classification (OTC), a novel training criterion that explicitly incorporates label uncertainties originating from such weak supervision. This allows the model to effectively learn speech-text alignments while accommodating errors present in the training transcripts. OTC extends the conventional CTC objective for imperfect transcripts by leveraging weighted finite state transducers. Through experiments conducted on the LibriSpeech and LibriVox datasets, we demonstrate that training ASR models with OTC avoids performance degradation even with transcripts containing up to 70% errors, a scenario where CTC models fail completely. Our implementation is available at this https URL.

Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2309.15796 [eess.AS]
	(or arXiv:2309.15796v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2309.15796

Submission history

From: Dongji Gao [view email]
[v1] Tue, 26 Sep 2023 12:58:40 UTC (634 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators