Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph

Zhao, Yibo; Zhu, Jiapeng; Xu, Can; Li, Xiang

Computer Science > Computation and Language

arXiv:2412.15268 (cs)

[Submitted on 17 Dec 2024 (v1), last revised 24 Dec 2024 (this version, v2)]

Title:Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph

Authors:Yibo Zhao, Jiapeng Zhu, Can Xu, Xiang Li

View PDF HTML (experimental)

Abstract:The rapid growth of social media platforms has raised significant concerns regarding online content toxicity. When Large Language Models (LLMs) are used for toxicity detection, two key challenges emerge: 1) the absence of domain-specific toxic knowledge leads to false negatives; 2) the excessive sensitivity of LLMs to toxic speech results in false positives, limiting freedom of speech. To address these issues, we propose a novel method called MetaTox, leveraging graph search on a meta-toxic knowledge graph to enhance hatred and toxicity detection. First, we construct a comprehensive meta-toxic knowledge graph by utilizing LLMs to extract toxic information through a three-step pipeline, with toxic benchmark datasets serving as corpora. Second, we query the graph via retrieval and ranking processes to supplement accurate, relevant toxic knowledge. Extensive experiments and in-depth case studies across multiple datasets demonstrate that our MetaTox significantly decreases the false positive rate while boosting overall toxicity detection performance. Our code will be available soon.

Comments:	8 pages of content
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2412.15268 [cs.CL]
	(or arXiv:2412.15268v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.15268

Submission history

From: YiBo Zhao [view email]
[v1] Tue, 17 Dec 2024 06:28:28 UTC (935 KB)
[v2] Tue, 24 Dec 2024 04:38:57 UTC (935 KB)

Computer Science > Computation and Language

Title:Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators