Classification of Spontaneous and Scripted Speech for Multilingual Audio

Elisha, Shahar; McDowell, Andrew; Beguerisse-Díaz, Mariano; Benetos, Emmanouil

Computer Science > Computation and Language

arXiv:2412.11896 (cs)

[Submitted on 16 Dec 2024]

Title:Classification of Spontaneous and Scripted Speech for Multilingual Audio

Authors:Shahar Elisha, Andrew McDowell, Mariano Beguerisse-Díaz, Emmanouil Benetos

View PDF HTML (experimental)

Abstract:Distinguishing scripted from spontaneous speech is an essential tool for better understanding how speech styles influence speech processing research. It can also improve recommendation systems and discovery experiences for media users through better segmentation of large recorded speech catalogues. This paper addresses the challenge of building a classifier that generalises well across different formats and languages. We systematically evaluate models ranging from traditional, handcrafted acoustic and prosodic features to advanced audio transformers, utilising a large, multilingual proprietary podcast dataset for training and validation. We break down the performance of each model across 11 language groups to evaluate cross-lingual biases. Our experimental analysis extends to publicly available datasets to assess the models' generalisability to non-podcast domains. Our results indicate that transformer-based models consistently outperform traditional feature-based techniques, achieving state-of-the-art performance in distinguishing between scripted and spontaneous speech across various languages.

Comments:	Accepted to IEEE Spoken Language Technology Workshop 2024
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2412.11896 [cs.CL]
	(or arXiv:2412.11896v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.11896

Submission history

From: Shahar Elisha [view email]
[v1] Mon, 16 Dec 2024 15:45:10 UTC (1,207 KB)

Computer Science > Computation and Language

Title:Classification of Spontaneous and Scripted Speech for Multilingual Audio

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Classification of Spontaneous and Scripted Speech for Multilingual Audio

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators