A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Bing, Xin; Bunea, Florentina; Wegkamp, Marten

Statistics > Machine Learning

arXiv:1805.06837 (stat)

[Submitted on 17 May 2018 (v1), last revised 4 Sep 2019 (this version, v3)]

Title:A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Authors:Xin Bing, Florentina Bunea, Marten Wegkamp

View PDF

Abstract:We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1805.06837 [stat.ML]
	(or arXiv:1805.06837v3 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1805.06837

Submission history

From: Marten Wegkamp [view email]
[v1] Thu, 17 May 2018 16:07:32 UTC (220 KB)
[v2] Tue, 12 Jun 2018 22:06:15 UTC (220 KB)
[v3] Wed, 4 Sep 2019 21:58:35 UTC (2,174 KB)

Statistics > Machine Learning

Title:A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators