Spectral Clustering-aware Learning of Embeddings for Speaker Diarisation

Lee, Evonne P. C.; Sun, Guangzhi; Zhang, Chao; Woodland, Philip C.

Computer Science > Sound

arXiv:2210.13576 (cs)

[Submitted on 24 Oct 2022 (v1), last revised 14 Mar 2023 (this version, v2)]

Title:Spectral Clustering-aware Learning of Embeddings for Speaker Diarisation

Authors:Evonne P.C. Lee, Guangzhi Sun, Chao Zhang, Philip C. Woodland

View PDF

Abstract:In speaker diarisation, speaker embedding extraction models often suffer from the mismatch between their training loss functions and the speaker clustering method. In this paper, we propose the method of spectral clustering-aware learning of embeddings (SCALE) to address the mismatch. Specifically, besides an angular prototype cal (AP) loss, SCALE uses a novel affinity matrix loss which directly minimises the error between the affinity matrix estimated from speaker embeddings and the reference. SCALE also includes p-percentile thresholding and Gaussian blur as two important hyper-parameters for spectral clustering in training. Experiments on the AMI dataset showed that speaker embeddings obtained with SCALE achieved over 50% relative speaker error rate reductions using oracle segmentation, and over 30% relative diarisation error rate reductions using automatic segmentation when compared to a strong baseline with the AP-loss-based speaker embeddings.

Comments:	To appear in ICASSP 2023, 5 pages
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2210.13576 [cs.SD]
	(or arXiv:2210.13576v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2210.13576

Submission history

From: Guangzhi Sun [view email]
[v1] Mon, 24 Oct 2022 19:55:07 UTC (287 KB)
[v2] Tue, 14 Mar 2023 23:04:08 UTC (396 KB)

Computer Science > Sound

Title:Spectral Clustering-aware Learning of Embeddings for Speaker Diarisation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Spectral Clustering-aware Learning of Embeddings for Speaker Diarisation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators