Few-Shot Multilingual Open-Domain QA from 5 Examples

Jiang, Fan; Drummond, Tom; Cohn, Trevor

Computer Science > Computation and Language

arXiv:2502.19722 (cs)

[Submitted on 27 Feb 2025]

Title:Few-Shot Multilingual Open-Domain QA from 5 Examples

Authors:Fan Jiang, Tom Drummond, Trevor Cohn

View PDF

Abstract:Recent approaches to multilingual open-domain question answering (MLODQA) have achieved promising results given abundant language-specific training data. However, the considerable annotation cost limits the application of these methods for underrepresented languages. We introduce a \emph{few-shot learning} approach to synthesise large-scale multilingual data from large language models (LLMs). Our method begins with large-scale self-supervised pre-training using WikiData, followed by training on high-quality synthetic multilingual data generated by prompting LLMs with few-shot supervision. The final model, \textsc{FsModQA}, significantly outperforms existing few-shot and supervised baselines in MLODQA and cross-lingual and monolingual retrieval. We further show our method can be extended for effective zero-shot adaptation to new languages through a \emph{cross-lingual prompting} strategy with only English-supervised data, making it a general and applicable solution for MLODQA tasks without costly large-scale annotation.

Comments:	Accepted by TACL; pre-MIT Press publication version
Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR)
Cite as:	arXiv:2502.19722 [cs.CL]
	(or arXiv:2502.19722v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.19722

Submission history

From: Fan Jiang [view email]
[v1] Thu, 27 Feb 2025 03:24:57 UTC (1,501 KB)

Computer Science > Computation and Language

Title:Few-Shot Multilingual Open-Domain QA from 5 Examples

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Few-Shot Multilingual Open-Domain QA from 5 Examples

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators