The Author-Topic Model for Authors and Documents

Rosen-Zvi, Michal; Griffiths, Thomas; Steyvers, Mark; Smyth, Padhraic

Computer Science > Information Retrieval

arXiv:1207.4169 (cs)

[Submitted on 11 Jul 2012]

Title:The Author-Topic Model for Authors and Documents

Authors:Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, Padhraic Smyth

View PDF

Abstract:We introduce the author-topic model, a generative model for documents that extends Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is associated with a multinomial distribution over topics and each topic is associated with a multinomial distribution over words. A document with multiple authors is modeled as a distribution over topics that is a mixture of the distributions associated with the authors. We apply the model to a collection of 1,700 NIPS conference papers and 160,000 CiteSeer abstracts. Exact inference is intractable for these datasets and we use Gibbs sampling to estimate the topic and author distributions. We compare the performance with two other generative models for documents, which are special cases of the author-topic model: LDA (a topic model) and a simple author model in which each author is associated with a distribution over words rather than a distribution over topics. We show topics recovered by the author-topic model, and demonstrate applications to computing similarity between authors and entropy of author output.

Comments:	Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004)
Subjects:	Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)
Report number:	UAI-P-2004-PG-487-494
Cite as:	arXiv:1207.4169 [cs.IR]
	(or arXiv:1207.4169v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.1207.4169

Submission history

From: Michal Rosen-Zvi [view email] [via AUAI proxy]
[v1] Wed, 11 Jul 2012 15:05:53 UTC (479 KB)

Computer Science > Information Retrieval

Title:The Author-Topic Model for Authors and Documents

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:The Author-Topic Model for Authors and Documents

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators