Federated Non-negative Matrix Factorization for Short Texts Topic Modeling with Mutual Information

Si, Shijing; Wang, Jianzong; Zhang, Ruiyi; Su, Qinliang; Xiao, Jing

Computer Science > Computation and Language

arXiv:2205.13300 (cs)

[Submitted on 26 May 2022]

Title:Federated Non-negative Matrix Factorization for Short Texts Topic Modeling with Mutual Information

Authors:Shijing Si, Jianzong Wang, Ruiyi Zhang, Qinliang Su, Jing Xiao

View PDF

Abstract:Non-negative matrix factorization (NMF) based topic modeling is widely used in natural language processing (NLP) to uncover hidden topics of short text documents. Usually, training a high-quality topic model requires large amount of textual data. In many real-world scenarios, customer textual data should be private and sensitive, precluding uploading to data centers. This paper proposes a Federated NMF (FedNMF) framework, which allows multiple clients to collaboratively train a high-quality NMF based topic model with locally stored data. However, standard federated learning will significantly undermine the performance of topic models in downstream tasks (e.g., text classification) when the data distribution over clients is heterogeneous. To alleviate this issue, we further propose FedNMF+MI, which simultaneously maximizes the mutual information (MI) between the count features of local texts and their topic weight vectors to mitigate the performance degradation. Experimental results show that our FedNMF+MI methods outperform Federated Latent Dirichlet Allocation (FedLDA) and the FedNMF without MI methods for short texts by a significant margin on both coherence score and classification F1 score.

Comments:	7 pages, 4 figures, accepted by IJCNN 2022
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2205.13300 [cs.CL]
	(or arXiv:2205.13300v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2205.13300
Journal reference:	IJCNN 2022

Submission history

From: Shijing Si [view email]
[v1] Thu, 26 May 2022 12:22:34 UTC (642 KB)

Computer Science > Computation and Language

Title:Federated Non-negative Matrix Factorization for Short Texts Topic Modeling with Mutual Information

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Federated Non-negative Matrix Factorization for Short Texts Topic Modeling with Mutual Information

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators