mEdIT: Multilingual Text Editing via Instruction Tuning

Raheja, Vipul; Alikaniotis, Dimitris; Kulkarni, Vivek; Alhafni, Bashar; Kumar, Dhruv

Computer Science > Computation and Language

arXiv:2402.16472 (cs)

[Submitted on 26 Feb 2024 (v1), last revised 17 Apr 2024 (this version, v2)]

Title:mEdIT: Multilingual Text Editing via Instruction Tuning

Authors:Vipul Raheja, Dimitris Alikaniotis, Vivek Kulkarni, Bashar Alhafni, Dhruv Kumar

View PDF HTML (experimental)

Abstract:We introduce mEdIT, a multi-lingual extension to CoEdIT -- the recent state-of-the-art text editing models for writing assistance. mEdIT models are trained by fine-tuning multi-lingual large, pre-trained language models (LLMs) via instruction tuning. They are designed to take instructions from the user specifying the attributes of the desired text in the form of natural language instructions, such as Grammatik korrigieren (German) or Parafrasee la oración (Spanish). We build mEdIT by curating data from multiple publicly available human-annotated text editing datasets for three text editing tasks (Grammatical Error Correction (GEC), Text Simplification, and Paraphrasing) across diverse languages belonging to six different language families. We detail the design and training of mEdIT models and demonstrate their strong performance on many multi-lingual text editing benchmarks against other multilingual LLMs. We also find that mEdIT generalizes effectively to new languages over multilingual baselines. We publicly release our data, code, and trained models at this https URL.

Comments:	Accepted to NAACL 2024 (Main). 23 pages, 8 tables, 11 figures
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
ACM classes:	I.2.7
Cite as:	arXiv:2402.16472 [cs.CL]
	(or arXiv:2402.16472v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.16472

Submission history

From: Vipul Raheja [view email]
[v1] Mon, 26 Feb 2024 10:33:36 UTC (7,817 KB)
[v2] Wed, 17 Apr 2024 16:59:30 UTC (7,764 KB)

Computer Science > Computation and Language

Title:mEdIT: Multilingual Text Editing via Instruction Tuning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:mEdIT: Multilingual Text Editing via Instruction Tuning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators