A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models

Oualil, Youssef; Klakow, Dietrich

Computer Science > Computation and Language

arXiv:1708.05997 (cs)

[Submitted on 20 Aug 2017 (v1), last revised 22 Aug 2017 (this version, v2)]

Title:A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models

Authors:Youssef Oualil, Dietrich Klakow

View PDF

Abstract:Training large vocabulary Neural Network Language Models (NNLMs) is a difficult task due to the explicit requirement of the output layer normalization, which typically involves the evaluation of the full softmax function over the complete vocabulary. This paper proposes a Batch Noise Contrastive Estimation (B-NCE) approach to alleviate this problem. This is achieved by reducing the vocabulary, at each time step, to the target words in the batch and then replacing the softmax by the noise contrastive estimation approach, where these words play the role of targets and noise samples at the same time. In doing so, the proposed approach can be fully formulated and implemented using optimal dense matrix operations. Applying B-NCE to train different NNLMs on the Large Text Compression Benchmark (LTCB) and the One Billion Word Benchmark (OBWB) shows a significant reduction of the training time with no noticeable degradation of the models performance. This paper also presents a new baseline comparative study of different standard NNLMs on the large OBWB on a single Titan-X GPU.

Comments:	Accepted for publication at INTERSPEECH'17
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
MSC classes:	97K50
ACM classes:	I.2.7
Cite as:	arXiv:1708.05997 [cs.CL]
	(or arXiv:1708.05997v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1708.05997

Submission history

From: Youssef Oualil [view email]
[v1] Sun, 20 Aug 2017 17:48:35 UTC (17 KB)
[v2] Tue, 22 Aug 2017 09:15:38 UTC (17 KB)

Computer Science > Computation and Language

Title:A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators