Hierarchical Neural Language Models for Joint Representation of Streaming Documents and their Content

Djuric, Nemanja; Wu, Hao; Radosavljevic, Vladan; Grbovic, Mihajlo; Bhamidipati, Narayan

doi:10.1145/2736277.2741643

Computer Science > Computation and Language

arXiv:1606.08689 (cs)

[Submitted on 28 Jun 2016]

Title:Hierarchical Neural Language Models for Joint Representation of Streaming Documents and their Content

Authors:Nemanja Djuric, Hao Wu, Vladan Radosavljevic, Mihajlo Grbovic, Narayan Bhamidipati

View PDF

Abstract:We consider the problem of learning distributed representations for documents in data streams. The documents are represented as low-dimensional vectors and are jointly learned with distributed vector representations of word tokens using a hierarchical framework with two embedded neural language models. In particular, we exploit the context of documents in streams and use one of the language models to model the document sequences, and the other to model word sequences within them. The models learn continuous vector representations for both word tokens and documents such that semantically similar documents and words are close in a common vector space. We discuss extensions to our model, which can be applied to personalized recommendation and social relationship mining by adding further user layers to the hierarchy, thus learning user-specific vectors to represent individual preferences. We validated the learned representations on a public movie rating data set from MovieLens, as well as on a large-scale Yahoo News data comprising three months of user activity logs collected on Yahoo servers. The results indicate that the proposed model can learn useful representations of both documents and word tokens, outperforming the current state-of-the-art by a large margin.

Comments:	24th International World Wide Web Conference
Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR)
ACM classes:	I.2.7; I.5.4; I.7.m
Cite as:	arXiv:1606.08689 [cs.CL]
	(or arXiv:1606.08689v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1606.08689
Related DOI:	https://doi.org/10.1145/2736277.2741643

Submission history

From: Nemanja Djuric [view email]
[v1] Tue, 28 Jun 2016 13:32:08 UTC (77 KB)

Computer Science > Computation and Language

Title:Hierarchical Neural Language Models for Joint Representation of Streaming Documents and their Content

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Hierarchical Neural Language Models for Joint Representation of Streaming Documents and their Content

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators