Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

Wang, Helin; Ravichandran, Venkatesh; Rao, Milind; Lammers, Becky; Sydnor, Myra; Maragakis, Nicholas; Butala, Ankur A.; Zhang, Jayne; Clawson, Lora; Chovaz, Victoria; Moro-Velazquez, Laureano

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2311.10149 (eess)

[Submitted on 16 Nov 2023]

Title:Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

Authors:Helin Wang, Venkatesh Ravichandran, Milind Rao, Becky Lammers, Myra Sydnor, Nicholas Maragakis, Ankur A. Butala, Jayne Zhang, Lora Clawson, Victoria Chovaz, Laureano Moro-Velazquez

View PDF

Abstract:Spoken language understanding (SLU) systems often exhibit suboptimal performance in processing atypical speech, typically caused by neurological conditions and motor impairments. Recent advancements in Text-to-Speech (TTS) synthesis-based augmentation for more fair SLU have struggled to accurately capture the unique vocal characteristics of atypical speakers, largely due to insufficient data. To address this issue, we present a novel data augmentation method for atypical speakers by finetuning a TTS model, called Aty-TTS. Aty-TTS models speaker and atypical characteristics via knowledge transferring from a voice conversion model. Then, we use the augmented data to train SLU models adapted to atypical speech. To train these data augmentation models and evaluate the resulting SLU systems, we have collected a new atypical speech dataset containing intent annotation. Both objective and subjective assessments validate that Aty-TTS is capable of generating high-quality atypical speech. Furthermore, it serves as an effective data augmentation strategy, contributing to more fair SLU systems that can better accommodate individuals with atypical speech patterns.

Comments:	Accepted at SyntheticData4ML 2023 Oral
Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2311.10149 [eess.AS]
	(or arXiv:2311.10149v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2311.10149

Submission history

From: Helin Wang [view email]
[v1] Thu, 16 Nov 2023 19:09:28 UTC (105 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators