Unsupervised Speech Segmentation: A General Approach Using Speech Language Models

Elmakies, Avishai; Abend, Omri; Adi, Yossi

Computer Science > Computation and Language

arXiv:2501.03711 (cs)

[Submitted on 7 Jan 2025]

Title:Unsupervised Speech Segmentation: A General Approach Using Speech Language Models

Authors:Avishai Elmakies, Omri Abend, Yossi Adi

View PDF HTML (experimental)

Abstract:In this paper, we introduce an unsupervised approach for Speech Segmentation, which builds on previously researched approaches, e.g., Speaker Diarization, while being applicable to an inclusive set of acoustic-semantic distinctions, paving a path towards a general Unsupervised Speech Segmentation approach. Unlike traditional speech and audio segmentation, which mainly focuses on spectral changes in the input signal, e.g., phone segmentation, our approach tries to segment the spoken utterance into chunks with differing acoustic-semantic styles, focusing on acoustic-semantic information that does not translate well into text, e.g., emotion or speaker. While most Speech Segmentation tasks only handle one style change, e.g., emotion diarization, our approach tries to handle multiple acoustic-semantic style changes. Leveraging recent advances in Speech Language Models (SLMs), we propose a simple unsupervised method to segment a given speech utterance. We empirically demonstrate the effectiveness of the proposed approach by considering several setups. Results suggest that the proposed method is superior to the evaluated baselines on boundary detection, segment purity, and over-segmentation. Code is available at this https URL.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2501.03711 [cs.CL]
	(or arXiv:2501.03711v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2501.03711

Submission history

From: Avishai Elmakies [view email]
[v1] Tue, 7 Jan 2025 11:32:13 UTC (258 KB)

Computer Science > Computation and Language

Title:Unsupervised Speech Segmentation: A General Approach Using Speech Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Unsupervised Speech Segmentation: A General Approach Using Speech Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators