Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification using Pre-trained Language Models

Wang, Shuohuan; Liu, Jiaxiang; Ouyang, Xuan; Sun, Yu

Computer Science > Computation and Language

arXiv:2010.03542 (cs)

[Submitted on 7 Oct 2020]

Title:Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification using Pre-trained Language Models

Authors:Shuohuan Wang, Jiaxiang Liu, Xuan Ouyang, Yu Sun

View PDF

Abstract:This paper describes Galileo's performance in SemEval-2020 Task 12 on detecting and categorizing offensive language in social media. For Offensive Language Identification, we proposed a multi-lingual method using Pre-trained Language Models, ERNIE and XLM-R. For offensive language categorization, we proposed a knowledge distillation method trained on soft labels generated by several supervised models. Our team participated in all three sub-tasks. In Sub-task A - Offensive Language Identification, we ranked first in terms of average F1 scores in all languages. We are also the only team which ranked among the top three across all languages. We also took the first place in Sub-task B - Automatic Categorization of Offense Types and Sub-task C - Offence Target Identification.

Comments:	8 pages, 2 figures, 6 tables. Accepted at Proceedings of 14th International Workshop on Semantic Evaluation (SemEval-2020)
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2010.03542 [cs.CL]
	(or arXiv:2010.03542v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2010.03542

Submission history

From: Shuohuan Wang [view email]
[v1] Wed, 7 Oct 2020 17:40:19 UTC (72 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-10

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Shuohuan Wang
Jiaxiang Liu
Yu Sun

export BibTeX citation

Computer Science > Computation and Language

Title:Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification using Pre-trained Language Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification using Pre-trained Language Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators