Cross-Lingual Transfer Learning for Complex Word Identification

Zaharia, George-Eduard; Cercel, Dumitru-Clementin; Dascalu, Mihai

Computer Science > Computation and Language

arXiv:2010.01108 (cs)

[Submitted on 2 Oct 2020]

Title:Cross-Lingual Transfer Learning for Complex Word Identification

Authors:George-Eduard Zaharia, Dumitru-Clementin Cercel, Mihai Dascalu

View PDF

Abstract:Complex Word Identification (CWI) is a task centered on detecting hard-to-understand words, or groups of words, in texts from different areas of expertise. The purpose of CWI is to highlight problematic structures that non-native speakers would usually find difficult to understand. Our approach uses zero-shot, one-shot, and few-shot learning techniques, alongside state-of-the-art solutions for Natural Language Processing (NLP) tasks (i.e., Transformers). Our aim is to provide evidence that the proposed models can learn the characteristics of complex words in a multilingual environment by relying on the CWI shared task 2018 dataset available for four different languages (i.e., English, German, Spanish, and also French). Our approach surpasses state-of-the-art cross-lingual results in terms of macro F1-score on English (0.774), German (0.782), and Spanish (0.734) languages, for the zero-shot learning scenario. At the same time, our model also outperforms the state-of-the-art monolingual result for German (0.795 macro F1-score).

Comments:	accepted at ICTAI 2020, 7 pages, 5 tables
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2010.01108 [cs.CL]
	(or arXiv:2010.01108v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2010.01108

Submission history

From: Dumitru-Clementin Cercel [view email]
[v1] Fri, 2 Oct 2020 17:09:47 UTC (28 KB)

Computer Science > Computation and Language

Title:Cross-Lingual Transfer Learning for Complex Word Identification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Cross-Lingual Transfer Learning for Complex Word Identification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators