UoB at SemEval-2021 Task 5: Extending Pre-Trained Language Models to Include Task and Domain-Specific Information for Toxic Span Prediction

Yan, Erik; Madabushi, Harish Tayyar

doi:10.18653/v1/2021.semeval-1.28

Computer Science > Computation and Language

arXiv:2110.03730 (cs)

[Submitted on 7 Oct 2021]

Title:UoB at SemEval-2021 Task 5: Extending Pre-Trained Language Models to Include Task and Domain-Specific Information for Toxic Span Prediction

Authors:Erik Yan, Harish Tayyar Madabushi

View PDF

Abstract:Toxicity is pervasive in social media and poses a major threat to the health of online communities. The recent introduction of pre-trained language models, which have achieved state-of-the-art results in many NLP tasks, has transformed the way in which we approach natural language processing. However, the inherent nature of pre-training means that they are unlikely to capture task-specific statistical information or learn domain-specific knowledge. Additionally, most implementations of these models typically do not employ conditional random fields, a method for simultaneous token classification. We show that these modifications can improve model performance on the Toxic Spans Detection task at SemEval-2021 to achieve a score within 4 percentage points of the top performing team.

Comments:	Published in Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021); Code available at: this https URL
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2110.03730 [cs.CL]
	(or arXiv:2110.03730v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2110.03730
Journal reference:	2021.semeval-1.28 (2021) 243-248
Related DOI:	https://doi.org/10.18653/v1/2021.semeval-1.28

Submission history

From: Harish Tayyar Madabushi PhD [view email]
[v1] Thu, 7 Oct 2021 18:29:06 UTC (5,227 KB)

Computer Science > Computation and Language

Title:UoB at SemEval-2021 Task 5: Extending Pre-Trained Language Models to Include Task and Domain-Specific Information for Toxic Span Prediction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:UoB at SemEval-2021 Task 5: Extending Pre-Trained Language Models to Include Task and Domain-Specific Information for Toxic Span Prediction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators