Implicit spoken language diarization

Mishra, Jagabandhu; Chowdhury, Amartya; Prasanna, S. R. Mahadeva

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2306.12913 (eess)

[Submitted on 22 Jun 2023]

Title:Implicit spoken language diarization

Authors:Jagabandhu Mishra, Amartya Chowdhury, S. R. Mahadeva Prasanna

View PDF

Abstract:Spoken language diarization (LD) and related tasks are mostly explored using the phonotactic approach. Phonotactic approaches mostly use explicit way of language modeling, hence requiring intermediate phoneme modeling and transcribed data. Alternatively, the ability of deep learning approaches to model temporal dynamics may help for the implicit modeling of language information through deep embedding vectors. Hence this work initially explores the available speaker diarization frameworks that capture speaker information implicitly to perform LD tasks. The performance of the LD system on synthetic code-switch data using the end-to-end x-vector approach is 6.78% and 7.06%, and for practical data is 22.50% and 60.38%, in terms of diarization error rate and Jaccard error rate (JER), respectively. The performance degradation is due to the data imbalance and resolved to some extent by using pre-trained wave2vec embeddings that provide a relative improvement of 30.74% in terms of JER.

Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2306.12913 [eess.AS]
	(or arXiv:2306.12913v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2306.12913

Submission history

From: Jagabandhu Mishra [view email]
[v1] Thu, 22 Jun 2023 14:29:53 UTC (4,312 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Implicit spoken language diarization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Implicit spoken language diarization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators