Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment

Dinh, Tuan; Sohn, Jy-yong; Rajput, Shashank; Ossowski, Timothy; Ming, Yifei; Hu, Junjie; Papailiopoulos, Dimitris; Lee, Kangwook

Computer Science > Computation and Language

arXiv:2205.11616 (cs)

[Submitted on 23 May 2022 (v1), last revised 7 Nov 2022 (this version, v2)]

Title:Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment

Authors:Tuan Dinh, Jy-yong Sohn, Shashank Rajput, Timothy Ossowski, Yifei Ming, Junjie Hu, Dimitris Papailiopoulos, Kangwook Lee

View PDF

Abstract:Word translation without parallel corpora has become feasible, rivaling the performance of supervised methods. Recent findings have shown that the accuracy and robustness of unsupervised word translation (UWT) can be improved by making use of visual observations, which are universal representations across languages. In this work, we investigate the potential of using not only visual observations but also pretrained language-image models for enabling a more efficient and robust UWT. Specifically, we develop a novel UWT method dubbed Word Alignment using Language-Image Pretraining (WALIP), which leverages visual observations via the shared embedding space of images and texts provided by CLIP models (Radford et al., 2021). WALIP has a two-step procedure. First, we retrieve word pairs with high confidences of similarity, computed using our proposed image-based fingerprints, which define the initial pivot for the word alignment. Second, we apply our robust Procrustes algorithm to estimate the linear mapping between two embedding spaces, which iteratively corrects and refines the estimated alignment. Our extensive experiments show that WALIP improves upon the state-of-the-art performance of bilingual word alignment for a few language pairs across different word embeddings and displays great robustness to the dissimilarity of language pairs or training corpora for two word embeddings.

Comments:	In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings)
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2205.11616 [cs.CL]
	(or arXiv:2205.11616v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2205.11616

Submission history

From: Tuan Dinh [view email]
[v1] Mon, 23 May 2022 20:29:26 UTC (2,964 KB)
[v2] Mon, 7 Nov 2022 18:20:40 UTC (2,000 KB)

Computer Science > Computation and Language

Title:Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators