Dewey Long Context Embedding Model: A Technical Report

Zhang, Dun; Zou, Panxiang; Zhou, Yudong

Computer Science > Information Retrieval

arXiv:2503.20376 (cs)

[Submitted on 26 Mar 2025]

Title:Dewey Long Context Embedding Model: A Technical Report

Authors:Dun Zhang, Panxiang Zou, Yudong Zhou

View PDF HTML (experimental)

Abstract:This technical report presents the training methodology and evaluation results of the open-source dewey_en_beta embedding model. The increasing demand for retrieval-augmented generation (RAG) systems and the expanding context window capabilities of large language models (LLMs) have created critical challenges for conventional embedding models. Current approaches often struggle to maintain semantic coherence when processing documents exceeding typical sequence length limitations, significantly impacting retrieval performance in knowledge-intensive applications. This paper presents dewey_en_beta, a novel text embedding model that achieves excellent performance on MTEB (Eng, v2) and LongEmbed benchmark while supporting 128K token sequences. Our technical contribution centers on chunk alignment training, an innovative methodology that enables the simultaneous generation of localized chunk embeddings and global document-level representations through distillation. Information regarding the model release can be found at this https URL.

Comments:	5 pages, 1 figure
Subjects:	Information Retrieval (cs.IR)
Cite as:	arXiv:2503.20376 [cs.IR]
	(or arXiv:2503.20376v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2503.20376

Submission history

From: Dun Zhang [view email]
[v1] Wed, 26 Mar 2025 09:55:00 UTC (270 KB)

Computer Science > Information Retrieval

Title:Dewey Long Context Embedding Model: A Technical Report

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Dewey Long Context Embedding Model: A Technical Report

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators