MultiProSE: A Multi-label Arabic Dataset for Propaganda, Sentiment, and Emotion Detection

Al-Henaki, Lubna; Al-Khalifa, Hend; Al-Salman, Abdulmalik; Alqubayshi, Hajar; Al-Twailay, Hind; Alghamdi, Gheeda; Aljasim, Hawra

Computer Science > Computation and Language

arXiv:2502.08319 (cs)

[Submitted on 12 Feb 2025]

Title:MultiProSE: A Multi-label Arabic Dataset for Propaganda, Sentiment, and Emotion Detection

Authors:Lubna Al-Henaki, Hend Al-Khalifa, Abdulmalik Al-Salman, Hajar Alqubayshi, Hind Al-Twailay, Gheeda Alghamdi, Hawra Aljasim

View PDF

Abstract:Propaganda is a form of persuasion that has been used throughout history with the intention goal of influencing people's opinions through rhetorical and psychological persuasion techniques for determined ends. Although Arabic ranked as the fourth most- used language on the internet, resources for propaganda detection in languages other than English, especially Arabic, remain extremely limited. To address this gap, the first Arabic dataset for Multi-label Propaganda, Sentiment, and Emotion (MultiProSE) has been introduced. MultiProSE is an open-source extension of the existing Arabic propaganda dataset, ArPro, with the addition of sentiment and emotion annotations for each text. This dataset comprises 8,000 annotated news articles, which is the largest propaganda dataset to date. For each task, several baselines have been developed using large language models (LLMs), such as GPT-4o-mini, and pre-trained language models (PLMs), including three BERT-based models. The dataset, annotation guidelines, and source code are all publicly released to facilitate future research and development in Arabic language models and contribute to a deeper understanding of how various opinion dimensions interact in news media1.

Comments:	12 pages, 3 figuers, 4 tabels
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2502.08319 [cs.CL]
	(or arXiv:2502.08319v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.08319

Submission history

From: Lubna Alhenaki [view email]
[v1] Wed, 12 Feb 2025 11:35:20 UTC (946 KB)

Computer Science > Computation and Language

Title:MultiProSE: A Multi-label Arabic Dataset for Propaganda, Sentiment, and Emotion Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:MultiProSE: A Multi-label Arabic Dataset for Propaganda, Sentiment, and Emotion Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators