Towards a Single ASR Model That Generalizes to Disordered Speech

Tobin, Jimmy; Tomanek, Katrin; Venugopalan, Subhashini

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2412.19315 (eess)

[Submitted on 26 Dec 2024]

Title:Towards a Single ASR Model That Generalizes to Disordered Speech

Authors:Jimmy Tobin, Katrin Tomanek, Subhashini Venugopalan

View PDF HTML (experimental)

Abstract:This study investigates the impact of integrating a dataset of disordered speech recordings ($\sim$1,000 hours) into the fine-tuning of a near state-of-the-art ASR baseline system. Contrary to what one might expect, despite the data being less than 1% of the training data of the ASR system, we find a considerable improvement in disordered speech recognition accuracy. Specifically, we observe a 33% improvement on prompted speech, and a 26% improvement on a newly gathered spontaneous, conversational dataset of disordered speech. Importantly, there is no significant performance decline on standard speech recognition benchmarks. Further, we observe that the proposed tuning strategy helps close the gap between the baseline system and personalized models by 64% highlighting the significant progress as well as the room for improvement. Given the substantial benefits of our findings, this experiment suggests that from a fairness perspective, incorporating a small fraction of high quality disordered speech data in a training recipe is an easy step that could be done to make speech technology more accessible for users with speech disabilities.

Comments:	Accepted at ICASSP 2025
Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2412.19315 [eess.AS]
	(or arXiv:2412.19315v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2412.19315

Submission history

From: Subhashini Venugopalan [view email]
[v1] Thu, 26 Dec 2024 18:39:15 UTC (217 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Towards a Single ASR Model That Generalizes to Disordered Speech

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Towards a Single ASR Model That Generalizes to Disordered Speech

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators