SOT Triggered Neural Clustering for Speaker Attributed ASR

Zheng, Xianrui; Sun, Guangzhi; Zhang, Chao; Woodland, Philip C.

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2407.02007 (eess)

[Submitted on 2 Jul 2024 (v1), last revised 30 Aug 2024 (this version, v2)]

Title:SOT Triggered Neural Clustering for Speaker Attributed ASR

Authors:Xianrui Zheng, Guangzhi Sun, Chao Zhang, Philip C. Woodland

View PDF HTML (experimental)

Abstract:This paper introduces a novel approach to speaker-attributed ASR transcription using a neural clustering method. With a parallel processing mechanism, diarisation and ASR can be applied simultaneously, helping to prevent the accumulation of errors from one sub-system to the next in a cascaded system. This is achieved by the use of ASR, trained using a serialised output training method, together with segment-level discriminative neural clustering (SDNC) to assign speaker labels. With SDNC, our system does not require an extra non-neural clustering method to assign speaker labels, thus allowing the entire system to be based on neural networks. Experimental results on the AMI meeting dataset demonstrate that SDNC outperforms spectral clustering (SC) by a 19% relative diarisation error rate (DER) reduction on the AMI Eval set. When compared with the cascaded system with SC, the parallel system with SDNC gives a 7%/4% relative improvement in cpWER on the Dev/Eval set.

Comments:	To appear in Interspeech 2024
Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2407.02007 [eess.AS]
	(or arXiv:2407.02007v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2407.02007

Submission history

From: Xianrui Zheng [view email]
[v1] Tue, 2 Jul 2024 07:26:29 UTC (693 KB)
[v2] Fri, 30 Aug 2024 20:58:01 UTC (693 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:SOT Triggered Neural Clustering for Speaker Attributed ASR

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:SOT Triggered Neural Clustering for Speaker Attributed ASR

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators