Large-scale learning of generalised representations for speaker recognition

Jung, Jee-weon; Heo, Hee-Soo; Lee, Bong-Jin; Lee, Jaesong; Shim, Hye-jin; Kwon, Youngki; Chung, Joon Son; Watanabe, Shinji

Computer Science > Sound

arXiv:2210.10985 (cs)

[Submitted on 20 Oct 2022 (v1), last revised 27 Oct 2022 (this version, v2)]

Title:Large-scale learning of generalised representations for speaker recognition

Authors:Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesong Lee, Hye-jin Shim, Youngki Kwon, Joon Son Chung, Shinji Watanabe

View PDF

Abstract:The objective of this work is to develop a speaker recognition model to be used in diverse scenarios. We hypothesise that two components should be adequately configured to build such a model. First, adequate architecture would be required. We explore several recent state-of-the-art models, including ECAPA-TDNN and MFA-Conformer, as well as other baselines. Second, a massive amount of data would be required. We investigate several new training data configurations combining a few existing datasets. The most extensive configuration includes over 87k speakers' 10.22k hours of speech. Four evaluation protocols are adopted to measure how the trained model performs in diverse scenarios. Through experiments, we find that MFA-Conformer with the least inductive bias generalises the best. We also show that training with proposed large data configurations gives better performance. A boost in generalisation is observed, where the average performance on four evaluation protocols improves by more than 20%. In addition, we also demonstrate that these models' performances can improve even further when increasing capacity.

Comments:	5pages, 5 tables, submitted to ICASSP
Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2210.10985 [cs.SD]
	(or arXiv:2210.10985v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2210.10985

Submission history

From: Jee-Weon Jung [view email]
[v1] Thu, 20 Oct 2022 03:08:18 UTC (30 KB)
[v2] Thu, 27 Oct 2022 05:11:14 UTC (28 KB)

Computer Science > Sound

Title:Large-scale learning of generalised representations for speaker recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Large-scale learning of generalised representations for speaker recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators