An Instability in Variational Inference for Topic Models

Ghorbani, Behrooz; Javadi, Hamid; Montanari, Andrea

Abstract:Topic models are Bayesian models that are frequently used to capture the latent structure of certain corpora of documents or images. Each data element in such a corpus (for instance each item in a collection of scientific articles) is regarded as a convex combination of a small number of vectors corresponding to `topics' or `components'. The weights are assumed to have a Dirichlet prior distribution. The standard approach towards approximating the posterior is to use variational inference algorithms, and in particular a mean field approximation.
We show that this approach suffers from an instability that can produce misleading conclusions. Namely, for certain regimes of the model parameters, variational inference outputs a non-trivial decomposition into topics. However --for the same parameter values-- the data contain no actual information about the true decomposition, and hence the output of the algorithm is uncorrelated with the true topic decomposition. Among other consequences, the estimated posterior mean is significantly wrong, and estimated Bayesian credible regions do not achieve the nominal coverage. We discuss how this instability is remedied by more accurate mean field approximations.

Comments:	69 pages; 18 pdf figures
Subjects:	Machine Learning (stat.ML)
Cite as:	arXiv:1802.00568 [stat.ML]
	(or arXiv:1802.00568v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1802.00568

Statistics > Machine Learning

Title:An Instability in Variational Inference for Topic Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators