Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement Learning

Nagpal, Chirag; Venugopalan, Subhashini; Tobin, Jimmy; Ladewig, Marilyn; Heller, Katherine; Tomanek, Katrin

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2501.00039 (eess)

[Submitted on 25 Dec 2024]

Title:Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement Learning

Authors:Chirag Nagpal, Subhashini Venugopalan, Jimmy Tobin, Marilyn Ladewig, Katherine Heller, Katrin Tomanek

View PDF HTML (experimental)

Abstract:We introduce a large language model (LLM) capable of processing speech inputs and show that tuning it further with reinforcement learning on human preference (RLHF) enables it to adapt better to disordered speech than traditional fine-tuning. Our method replaces low-frequency text tokens in an LLM's vocabulary with audio tokens and enables the model to recognize speech by fine-tuning it on speech with transcripts. We then use RL with rewards based on syntactic and semantic accuracy measures generalizing the LLM further to recognize disordered speech. While the resulting LLM does not outperform existing systems for speech recognition, we find that tuning with reinforcement learning using custom rewards leads to substantially better performance than supervised fine-tuning of the language model, specifically when adapting to speech in a different setting. This presents a compelling alternative tuning strategy for speech recognition using large language models.

Comments:	Accepted at ICASSP 2025
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2501.00039 [eess.AS]
	(or arXiv:2501.00039v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2501.00039

Submission history

From: Subhashini Venugopalan [view email]
[v1] Wed, 25 Dec 2024 00:16:22 UTC (521 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators