Seeded Poisson Factorization: Leveraging domain knowledge to fit topic models

Prostmaier, Bernd; Vávra, Jan; Grün, Bettina; Hofmarcher, Paul

Statistics > Methodology

arXiv:2503.02741 (stat)

[Submitted on 4 Mar 2025]

Title:Seeded Poisson Factorization: Leveraging domain knowledge to fit topic models

Authors:Bernd Prostmaier, Jan Vávra, Bettina Grün, Paul Hofmarcher

View PDF HTML (experimental)

Abstract:Topic models are widely used for discovering latent thematic structures in large text corpora, yet traditional unsupervised methods often struggle to align with predefined conceptual domains. This paper introduces Seeded Poisson Factorization (SPF), a novel approach that extends the Poisson Factorization framework by incorporating domain knowledge through seed words. SPF enables a more interpretable and structured topic discovery by modifying the prior distribution of topic-specific term intensities, assigning higher initial rates to predefined seed words. The model is estimated using variational inference with stochastic gradient optimization, ensuring scalability to large datasets.
We apply SPF to an Amazon customer feedback dataset, leveraging predefined product categories as guiding structures. Our evaluation demonstrates that SPF achieves superior classification performance compared to alternative guided topic models, particularly in terms of computational efficiency and predictive performance. Furthermore, robustness checks highlight SPF's ability to adaptively balance domain knowledge and data-driven topic discovery, even in cases of imperfect seed word selection. These results establish SPF as a powerful and scalable alternative for integrating expert knowledge into topic modeling, enhancing both interpretability and efficiency in real-world applications.

Subjects:	Methodology (stat.ME); Computation and Language (cs.CL); Machine Learning (cs.LG); General Economics (econ.GN)
Cite as:	arXiv:2503.02741 [stat.ME]
	(or arXiv:2503.02741v1 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.2503.02741

Submission history

From: Bernd Prostmaier [view email]
[v1] Tue, 4 Mar 2025 16:05:13 UTC (127 KB)

Statistics > Methodology

Title:Seeded Poisson Factorization: Leveraging domain knowledge to fit topic models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Methodology

Title:Seeded Poisson Factorization: Leveraging domain knowledge to fit topic models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators