Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning

Nacar, Omer; Koubaa, Anis

Computer Science > Computation and Language

arXiv:2407.21139 (cs)

[Submitted on 30 Jul 2024 (v1), last revised 1 Aug 2024 (this version, v2)]

Title:Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning

Authors:Omer Nacar, Anis Koubaa

View PDF HTML (experimental)

Abstract:This work presents a novel framework for training Arabic nested embedding models through Matryoshka Embedding Learning, leveraging multilingual, Arabic-specific, and English-based models, to highlight the power of nested embeddings models in various Arabic NLP downstream tasks. Our innovative contribution includes the translation of various sentence similarity datasets into Arabic, enabling a comprehensive evaluation framework to compare these models across different dimensions. We trained several nested embedding models on the Arabic Natural Language Inference triplet dataset and assessed their performance using multiple evaluation metrics, including Pearson and Spearman correlations for cosine similarity, Manhattan distance, Euclidean distance, and dot product similarity. The results demonstrate the superior performance of the Matryoshka embedding models, particularly in capturing semantic nuances unique to the Arabic language. Results demonstrated that Arabic Matryoshka embedding models have superior performance in capturing semantic nuances unique to the Arabic language, significantly outperforming traditional models by up to 20-25\% across various similarity metrics. These results underscore the effectiveness of language-specific training and highlight the potential of Matryoshka models in enhancing semantic textual similarity tasks for Arabic NLP.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2407.21139 [cs.CL]
	(or arXiv:2407.21139v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.21139

Submission history

From: Omer Nacar [view email]
[v1] Tue, 30 Jul 2024 19:03:03 UTC (852 KB)
[v2] Thu, 1 Aug 2024 12:24:01 UTC (852 KB)

Computer Science > Computation and Language

Title:Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators