Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings

García-Ferrero, Iker; Agerri, Rodrigo; Rigau, German

Computer Science > Computation and Language

arXiv:2210.12623 (cs)

[Submitted on 23 Oct 2022 (v1), last revised 27 Apr 2023 (this version, v2)]

Title:Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings

Authors:Iker García-Ferrero, Rodrigo Agerri, German Rigau

View PDF

Abstract:Zero-resource cross-lingual transfer approaches aim to apply supervised models from a source language to unlabelled target languages. In this paper we perform an in-depth study of the two main techniques employed so far for cross-lingual zero-resource sequence labelling, based either on data or model transfer. Although previous research has proposed translation and annotation projection (data-based cross-lingual transfer) as an effective technique for cross-lingual sequence labelling, in this paper we experimentally demonstrate that high capacity multilingual language models applied in a zero-shot (model-based cross-lingual transfer) setting consistently outperform data-based cross-lingual transfer approaches. A detailed analysis of our results suggests that this might be due to important differences in language use. More specifically, machine translation often generates a textual signal which is different to what the models are exposed to when using gold standard data, which affects both the fine-tuning and evaluation processes. Our results also indicate that data-based cross-lingual transfer approaches remain a competitive option when high-capacity multilingual language models are not available.

Comments:	Findings of the Association for Computational Linguistics: EMNLP 2022
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2210.12623 [cs.CL]
	(or arXiv:2210.12623v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.12623
Journal reference:	Findings of the Association for Computational Linguistics EMNLP 2022, 6403-6416

Submission history

From: Iker García-Ferrero [view email]
[v1] Sun, 23 Oct 2022 05:37:35 UTC (3,834 KB)
[v2] Thu, 27 Apr 2023 10:39:45 UTC (3,834 KB)

Computer Science > Computation and Language

Title:Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators