CMLM-CSE: Based on Conditional MLM Contrastive Learning for Sentence Embeddings

Zhang, Wei; Chen, Xu

Computer Science > Computation and Language

arXiv:2306.09594 (cs)

[Submitted on 16 Jun 2023]

Title:CMLM-CSE: Based on Conditional MLM Contrastive Learning for Sentence Embeddings

Authors:Wei Zhang, Xu Chen

View PDF

Abstract:Traditional comparative learning sentence embedding directly uses the encoder to extract sentence features, and then passes in the comparative loss function for learning. However, this method pays too much attention to the sentence body and ignores the influence of some words in the sentence on the sentence semantics. To this end, we propose CMLM-CSE, an unsupervised contrastive learning framework based on conditional MLM. On the basis of traditional contrastive learning, an additional auxiliary network is added to integrate sentence embedding to perform MLM tasks, forcing sentence embedding to learn more masked word information. Finally, when Bertbase was used as the pretraining language model, we exceeded SimCSE by 0.55 percentage points on average in textual similarity tasks, and when Robertabase was used as the pretraining language model, we exceeded SimCSE by 0.3 percentage points on average in textual similarity tasks.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2306.09594 [cs.CL]
	(or arXiv:2306.09594v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.09594

Submission history

From: Wei Zhang [view email]
[v1] Fri, 16 Jun 2023 02:39:45 UTC (602 KB)

Computer Science > Computation and Language

Title:CMLM-CSE: Based on Conditional MLM Contrastive Learning for Sentence Embeddings

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:CMLM-CSE: Based on Conditional MLM Contrastive Learning for Sentence Embeddings

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators