Text-To-Speech Data Augmentation for Low Resource Speech Recognition

Zevallos, Rodolfo

Computer Science > Computation and Language

arXiv:2204.00291 (cs)

[Submitted on 1 Apr 2022]

Title:Text-To-Speech Data Augmentation for Low Resource Speech Recognition

Authors:Rodolfo Zevallos

View PDF

Abstract:Nowadays, the main problem of deep learning techniques used in the development of automatic speech recognition (ASR) models is the lack of transcribed data. The goal of this research is to propose a new data augmentation method to improve ASR models for agglutinative and low-resource languages. This novel data augmentation method generates both synthetic text and synthetic audio. Some experiments were conducted using the corpus of the Quechua language, which is an agglutinative and low-resource language. In this study, a sequence-to-sequence (seq2seq) model was applied to generate synthetic text, in addition to generating synthetic speech using a text-to-speech (TTS) model for Quechua. The results show that the new data augmentation method works well to improve the ASR model for Quechua. In this research, an 8.73% improvement in the word-error-rate (WER) of the ASR model is obtained using a combination of synthetic text and synthetic speech.

Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2204.00291 [cs.CL]
	(or arXiv:2204.00291v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2204.00291

Submission history

From: Rodolfo Zevallos Dr [view email]
[v1] Fri, 1 Apr 2022 08:53:44 UTC (1,071 KB)

Computer Science > Computation and Language

Title:Text-To-Speech Data Augmentation for Low Resource Speech Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Text-To-Speech Data Augmentation for Low Resource Speech Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators