Insights into End-to-End Learning Scheme for Language Identification

Cai, Weicheng; Cai, Zexin; Liu, Wenbo; Wang, Xiaoqi; Li, Ming

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:1804.00381 (eess)

[Submitted on 2 Apr 2018]

Title:Insights into End-to-End Learning Scheme for Language Identification

Authors:Weicheng Cai, Zexin Cai, Wenbo Liu, Xiaoqi Wang, Ming Li

View PDF

Abstract:A novel interpretable end-to-end learning scheme for language identification is proposed. It is in line with the classical GMM i-vector methods both theoretically and practically. In the end-to-end pipeline, a general encoding layer is employed on top of the front-end CNN, so that it can encode the variable-length input sequence into an utterance level vector automatically. After comparing with the state-of-the-art GMM i-vector methods, we give insights into CNN, and reveal its role and effect in the whole pipeline. We further introduce a general encoding layer, illustrating the reason why they might be appropriate for language identification. We elaborate on several typical encoding layers, including a temporal average pooling layer, a recurrent encoding layer and a novel learnable dictionary encoding layer. We conducted experiment on NIST LRE07 closed-set task, and the results show that our proposed end-to-end systems achieve state-of-the-art performance.

Comments:	ICASSP 2018 conference paper
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:1804.00381 [eess.AS]
	(or arXiv:1804.00381v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.1804.00381

Submission history

From: Weicheng Cai [view email]
[v1] Mon, 2 Apr 2018 03:19:44 UTC (408 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Insights into End-to-End Learning Scheme for Language Identification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Insights into End-to-End Learning Scheme for Language Identification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators