Toxicity Detection: Does Context Really Matter?

Pavlopoulos, John; Sorensen, Jeffrey; Dixon, Lucas; Thain, Nithum; Androutsopoulos, Ion

Computer Science > Computation and Language

arXiv:2006.00998 (cs)

[Submitted on 1 Jun 2020]

Title:Toxicity Detection: Does Context Really Matter?

Authors:John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon, Nithum Thain, Ion Androutsopoulos

View PDF

Abstract:Moderation is crucial to promoting healthy on-line discussions. Although several `toxicity' detection datasets and models have been published, most of them ignore the context of the posts, implicitly assuming that comments maybe judged independently. We investigate this assumption by focusing on two questions: (a) does context affect the human judgement, and (b) does conditioning on context improve performance of toxicity detection systems? We experiment with Wikipedia conversations, limiting the notion of context to the previous post in the thread and the discussion title. We find that context can both amplify or mitigate the perceived toxicity of posts. Moreover, a small but significant subset of manually labeled posts (5% in one of our experiments) end up having the opposite toxicity labels if the annotators are not provided with context. Surprisingly, we also find no evidence that context actually improves the performance of toxicity classifiers, having tried a range of classifiers and mechanisms to make them context aware. This points to the need for larger datasets of comments annotated in context. We make our code and data publicly available.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2006.00998 [cs.CL]
	(or arXiv:2006.00998v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2006.00998

Submission history

From: John Pavlopoulos [view email]
[v1] Mon, 1 Jun 2020 15:03:48 UTC (412 KB)

Computer Science > Computation and Language

Title:Toxicity Detection: Does Context Really Matter?

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Toxicity Detection: Does Context Really Matter?

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators