BERT-based Ensemble Approaches for Hate Speech Detection

Mnassri, Khouloud; Rajapaksha, Praboda; Farahbakhsh, Reza; Crespi, Noel

Computer Science > Computation and Language

arXiv:2209.06505 (cs)

[Submitted on 14 Sep 2022 (v1), last revised 15 Sep 2022 (this version, v2)]

Title:BERT-based Ensemble Approaches for Hate Speech Detection

Authors:Khouloud Mnassri, Praboda Rajapaksha, Reza Farahbakhsh, Noel Crespi

View PDF

Abstract:With the freedom of communication provided in online social media, hate speech has increasingly generated. This leads to cyber conflicts affecting social life at the individual and national levels. As a result, hateful content classification is becoming increasingly demanded for filtering hate content before being sent to the social networks. This paper focuses on classifying hate speech in social media using multiple deep models that are implemented by integrating recent transformer-based language models such as BERT, and neural networks. To improve the classification performances, we evaluated with several ensemble techniques, including soft voting, maximum value, hard voting and stacking. We used three publicly available Twitter datasets (Davidson, HatEval2019, OLID) that are generated to identify offensive languages. We fused all these datasets to generate a single dataset (DHO dataset), which is more balanced across different labels, to perform multi-label classification. Our experiments have been held on Davidson dataset and the DHO corpora. The later gave the best overall results, especially F1 macro score, even it required more resources (time execution and memory). The experiments have shown good results especially the ensemble models, where stacking gave F1 score of 97% on Davidson dataset and aggregating ensembles 77% on the DHO dataset.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2209.06505 [cs.CL]
	(or arXiv:2209.06505v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2209.06505

Submission history

From: Khouloud Mnassri [view email]
[v1] Wed, 14 Sep 2022 09:08:24 UTC (625 KB)
[v2] Thu, 15 Sep 2022 12:09:03 UTC (625 KB)

Computer Science > Computation and Language

Title:BERT-based Ensemble Approaches for Hate Speech Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:BERT-based Ensemble Approaches for Hate Speech Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators