SSDM: Scalable Speech Dysfluency Modeling

Lian, Jiachen; Zhou, Xuanru; Ezzes, Zoe; Vonk, Jet; Morin, Brittany; Baquirin, David; Mille, Zachary; Tempini, Maria Luisa Gorno; Anumanchipalli, Gopala Krishna

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2408.16221 (eess)

[Submitted on 29 Aug 2024 (v1), last revised 3 Oct 2024 (this version, v3)]

Title:SSDM: Scalable Speech Dysfluency Modeling

Authors:Jiachen Lian, Xuanru Zhou, Zoe Ezzes, Jet Vonk, Brittany Morin, David Baquirin, Zachary Mille, Maria Luisa Gorno Tempini, Gopala Krishna Anumanchipalli

View PDF HTML (experimental)

Abstract:Speech dysfluency modeling is the core module for spoken language learning, and speech therapy. However, there are three challenges. First, current state-of-the-art solutions\cite{lian2023unconstrained-udm, lian-anumanchipalli-2024-towards-hudm} suffer from poor scalability. Second, there is a lack of a large-scale dysfluency corpus. Third, there is not an effective learning framework. In this paper, we propose \textit{SSDM: Scalable Speech Dysfluency Modeling}, which (1) adopts articulatory gestures as scalable forced alignment; (2) introduces connectionist subsequence aligner (CSA) to achieve dysfluency alignment; (3) introduces a large-scale simulated dysfluency corpus called Libri-Dys; and (4) develops an end-to-end system by leveraging the power of large language models (LLMs). We expect SSDM to serve as a standard in the area of dysfluency modeling. Demo is available at \url{this https URL}.

Comments:	2024 NeurIPS
Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2408.16221 [eess.AS]
	(or arXiv:2408.16221v3 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2408.16221

Submission history

From: Jiachen Lian [view email]
[v1] Thu, 29 Aug 2024 02:35:53 UTC (7,881 KB)
[v2] Sat, 14 Sep 2024 20:57:08 UTC (7,881 KB)
[v3] Thu, 3 Oct 2024 21:37:49 UTC (7,884 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:SSDM: Scalable Speech Dysfluency Modeling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:SSDM: Scalable Speech Dysfluency Modeling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators