SignCLIP: Connecting Text and Sign Language by Contrastive Learning

Jiang, Zifan; Sant, Gerard; Moryossef, Amit; Müller, Mathias; Sennrich, Rico; Ebling, Sarah

Computer Science > Computation and Language

arXiv:2407.01264 (cs)

[Submitted on 1 Jul 2024 (v1), last revised 6 Oct 2024 (this version, v2)]

Title:SignCLIP: Connecting Text and Sign Language by Contrastive Learning

Authors:Zifan Jiang, Gerard Sant, Amit Moryossef, Mathias Müller, Rico Sennrich, Sarah Ebling

View PDF HTML (experimental)

Abstract:We present SignCLIP, which re-purposes CLIP (Contrastive Language-Image Pretraining) to project spoken language text and sign language videos, two classes of natural languages of distinct modalities, into the same space. SignCLIP is an efficient method of learning useful visual representations for sign language processing from large-scale, multilingual video-text pairs, without directly optimizing for a specific task or sign language which is often of limited size.
We pretrain SignCLIP on Spreadthesign, a prominent sign language dictionary consisting of ~500 thousand video clips in up to 44 sign languages, and evaluate it with various downstream datasets. SignCLIP discerns in-domain signing with notable text-to-video/video-to-text retrieval accuracy. It also performs competitively for out-of-domain downstream tasks such as isolated sign language recognition upon essential few-shot prompting or fine-tuning.
We analyze the latent space formed by the spoken language text and sign language poses, which provides additional linguistic insights. Our code and models are openly available.

Comments:	Accepted at EMNLP 2024 (Main)
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2407.01264 [cs.CL]
	(or arXiv:2407.01264v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.01264

Submission history

From: Zifan Jiang [view email]
[v1] Mon, 1 Jul 2024 13:17:35 UTC (3,470 KB)
[v2] Sun, 6 Oct 2024 09:41:37 UTC (3,474 KB)

Computer Science > Computation and Language

Title:SignCLIP: Connecting Text and Sign Language by Contrastive Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:SignCLIP: Connecting Text and Sign Language by Contrastive Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators