Improving Gloss-free Sign Language Translation by Reducing Representation Density

Ye, Jinhui; Wang, Xing; Jiao, Wenxiang; Liang, Junwei; Xiong, Hui

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.14312 (cs)

[Submitted on 23 May 2024]

Title:Improving Gloss-free Sign Language Translation by Reducing Representation Density

Authors:Jinhui Ye, Xing Wang, Wenxiang Jiao, Junwei Liang, Hui Xiong

View PDF HTML (experimental)

Abstract:Gloss-free sign language translation (SLT) aims to develop well-performing SLT systems with no requirement for the costly gloss annotations, but currently still lags behind gloss-based approaches significantly. In this paper, we identify a representation density problem that could be a bottleneck in restricting the performance of gloss-free SLT. Specifically, the representation density problem describes that the visual representations of semantically distinct sign gestures tend to be closely packed together in feature space, which makes gloss-free methods struggle with distinguishing different sign gestures and suffer from a sharp performance drop. To address the representation density problem, we introduce a simple but effective contrastive learning strategy, namely SignCL, which encourages gloss-free models to learn more discriminative feature representation in a self-supervised manner. Our experiments demonstrate that the proposed SignCL can significantly reduce the representation density and improve performance across various translation frameworks. Specifically, SignCL achieves a significant improvement in BLEU score for the Sign Language Transformer and GFSLT-VLP on the CSL-Daily dataset by 39% and 46%, respectively, without any increase of model parameters. Compared to Sign2GPT, a state-of-the-art method based on large-scale pre-trained vision and language models, SignCL achieves better performance with only 35% of its parameters. Implementation and Checkpoints are available at this https URL.

Comments:	Representation Density and Performance Drop
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
Cite as:	arXiv:2405.14312 [cs.CV]
	(or arXiv:2405.14312v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.14312

Submission history

From: Jinhui Ye [view email]
[v1] Thu, 23 May 2024 08:32:58 UTC (1,303 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Improving Gloss-free Sign Language Translation by Reducing Representation Density

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Improving Gloss-free Sign Language Translation by Reducing Representation Density

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators