Keyword Assisted Embedded Topic Model

Harandizadeh, Bahareh; Priniski, J. Hunter; Morstatter, Fred

doi:10.1145/3488560.3498518

Computer Science > Information Retrieval

arXiv:2112.03101 (cs)

[Submitted on 22 Nov 2021]

Title:Keyword Assisted Embedded Topic Model

Authors:Bahareh Harandizadeh, J. Hunter Priniski, Fred Morstatter

View PDF

Abstract:By illuminating latent structures in a corpus of text, topic models are an essential tool for categorizing, summarizing, and exploring large collections of documents. Probabilistic topic models, such as latent Dirichlet allocation (LDA), describe how words in documents are generated via a set of latent distributions called topics. Recently, the Embedded Topic Model (ETM) has extended LDA to utilize the semantic information in word embeddings to derive semantically richer topics. As LDA and its extensions are unsupervised models, they aren't defined to make efficient use of a user's prior knowledge of the domain. To this end, we propose the Keyword Assisted Embedded Topic Model (KeyETM), which equips ETM with the ability to incorporate user knowledge in the form of informative topic-level priors over the vocabulary. Using both quantitative metrics and human responses on a topic intrusion task, we demonstrate that KeyETM produces better topics than other guided, generative models in the literature.

Comments:	8 pages, 5 figures, WSDM 2022 Conference
Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2112.03101 [cs.IR]
	(or arXiv:2112.03101v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2112.03101
Related DOI:	https://doi.org/10.1145/3488560.3498518

Submission history

From: Bahareh Harandizadeh [view email]
[v1] Mon, 22 Nov 2021 07:27:17 UTC (1,140 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.IR

< prev | next >

new | recent | 2021-12

Change to browse by:

cs
cs.CL
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Bahareh Harandizadeh
Fred Morstatter

export BibTeX citation

Computer Science > Information Retrieval

Title:Keyword Assisted Embedded Topic Model

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Keyword Assisted Embedded Topic Model

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators