BanMANI: A Dataset to Identify Manipulated Social Media News in Bangla

Kamruzzaman, Mahammed; Shovon, Md. Minul Islam; Kim, Gene Louis

Computer Science > Computation and Language

arXiv:2311.02570 (cs)

[Submitted on 5 Nov 2023]

Title:BanMANI: A Dataset to Identify Manipulated Social Media News in Bangla

Authors:Mahammed Kamruzzaman, Md. Minul Islam Shovon, Gene Louis Kim

View PDF

Abstract:Initial work has been done to address fake news detection and misrepresentation of news in the Bengali language. However, no work in Bengali yet addresses the identification of specific claims in social media news that falsely manipulates a related news article. At this point, this problem has been tackled in English and a few other languages, but not in the Bengali language. In this paper, we curate a dataset of social media content labeled with information manipulation relative to reference articles, called BanMANI. The dataset collection method we describe works around the limitations of the available NLP tools in Bangla. We expect these techniques will carry over to building similar datasets in other low-resource languages. BanMANI forms the basis both for evaluating the capabilities of existing NLP systems and for training or fine-tuning new models specifically on this task. In our analysis, we find that this task challenges current LLMs both under zero-shot and fine-tuned settings.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2311.02570 [cs.CL]
	(or arXiv:2311.02570v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2311.02570

Submission history

From: Mahammed Kamruzzaman [view email]
[v1] Sun, 5 Nov 2023 05:49:57 UTC (7,493 KB)

Computer Science > Computation and Language

Title:BanMANI: A Dataset to Identify Manipulated Social Media News in Bangla

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:BanMANI: A Dataset to Identify Manipulated Social Media News in Bangla

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators