Jointly Learning Word Embeddings and Latent Topics

Shi, Bei; Lam, Wai; Jameel, Shoaib; Schockaert, Steven; Lai, Kwun Ping

doi:10.1145/3077136.3080806

Computer Science > Computation and Language

arXiv:1706.07276 (cs)

[Submitted on 21 Jun 2017]

Title:Jointly Learning Word Embeddings and Latent Topics

Authors:Bei Shi, Wai Lam, Shoaib Jameel, Steven Schockaert, Kwun Ping Lai

View PDF

Abstract:Word embedding models such as Skip-gram learn a vector-space representation for each word, based on the local word collocation patterns that are observed in a text corpus. Latent topic models, on the other hand, take a more global view, looking at the word distributions across the corpus to assign a topic to each word occurrence. These two paradigms are complementary in how they represent the meaning of word occurrences. While some previous works have already looked at using word embeddings for improving the quality of latent topics, and conversely, at using latent topics for improving word embeddings, such "two-step" methods cannot capture the mutual interaction between the two paradigms. In this paper, we propose STE, a framework which can learn word embeddings and latent topics in a unified manner. STE naturally obtains topic-specific word embeddings, and thus addresses the issue of polysemy. At the same time, it also learns the term distributions of the topics, and the topic distributions of the documents. Our experimental results demonstrate that the STE model can indeed generate useful topic-specific word embeddings and coherent latent topics in an effective and efficient way.

Comments:	10 pagess, 2 figures, full paper. To appear in the proceedings of The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17)
Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:1706.07276 [cs.CL]
	(or arXiv:1706.07276v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1706.07276
Related DOI:	https://doi.org/10.1145/3077136.3080806

Submission history

From: Bei Shi [view email]
[v1] Wed, 21 Jun 2017 06:19:24 UTC (771 KB)

Computer Science > Computation and Language

Title:Jointly Learning Word Embeddings and Latent Topics

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Jointly Learning Word Embeddings and Latent Topics

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators