3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset

Ma, Xinyu; Liu, Xuebo; Wong, Derek F.; Rao, Jun; Li, Bei; Ding, Liang; Chao, Lidia S.; Tao, Dacheng; Zhang, Min

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.18413 (cs)

[Submitted on 29 Apr 2024]

Title:3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset

Authors:Xinyu Ma, Xuebo Liu, Derek F. Wong, Jun Rao, Bei Li, Liang Ding, Lidia S. Chao, Dacheng Tao, Min Zhang

View PDF HTML (experimental)

Abstract:Multimodal machine translation (MMT) is a challenging task that seeks to improve translation quality by incorporating visual information. However, recent studies have indicated that the visual information provided by existing MMT datasets is insufficient, causing models to disregard it and overestimate their capabilities. This issue presents a significant obstacle to the development of MMT research. This paper presents a novel solution to this issue by introducing 3AM, an ambiguity-aware MMT dataset comprising 26,000 parallel sentence pairs in English and Chinese, each with corresponding images. Our dataset is specifically designed to include more ambiguity and a greater variety of both captions and images than other MMT datasets. We utilize a word sense disambiguation model to select ambiguous data from vision-and-language datasets, resulting in a more challenging dataset. We further benchmark several state-of-the-art MMT models on our proposed dataset. Experimental results show that MMT models trained on our dataset exhibit a greater ability to exploit visual information than those trained on other MMT datasets. Our work provides a valuable resource for researchers in the field of multimodal learning and encourages further exploration in this area. The data, code and scripts are freely available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.18413 [cs.CV]
	(or arXiv:2404.18413v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.18413

Submission history

From: Xinyu Ma [view email]
[v1] Mon, 29 Apr 2024 04:01:30 UTC (9,008 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators