Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

Wen, Tiansheng; Wang, Yifei; Zeng, Zequn; Peng, Zhong; Su, Yudi; Liu, Xinyang; Chen, Bo; Liu, Hongwei; Jegelka, Stefanie; You, Chenyu

Computer Science > Machine Learning

arXiv:2503.01776 (cs)

[Submitted on 3 Mar 2025 (v1), last revised 5 Mar 2025 (this version, v2)]

Title:Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

Authors:Tiansheng Wen, Yifei Wang, Zequn Zeng, Zhong Peng, Yudi Su, Xinyang Liu, Bo Chen, Hongwei Liu, Stefanie Jegelka, Chenyu You

View PDF HTML (experimental)

Abstract:Many large-scale systems rely on high-quality deep representations (embeddings) to facilitate tasks like retrieval, search, and generative modeling. Matryoshka Representation Learning (MRL) recently emerged as a solution for adaptive embedding lengths, but it requires full model retraining and suffers from noticeable performance degradations at short lengths. In this paper, we show that sparse coding offers a compelling alternative for achieving adaptive representation with minimal overhead and higher fidelity. We propose Contrastive Sparse Representation (CSR), a method that sparsifies pre-trained embeddings into a high-dimensional but selectively activated feature space. By leveraging lightweight autoencoding and task-aware contrastive objectives, CSR preserves semantic quality while allowing flexible, cost-effective inference at different sparsity levels. Extensive experiments on image, text, and multimodal benchmarks demonstrate that CSR consistently outperforms MRL in terms of both accuracy and retrieval speed-often by large margins-while also cutting training time to a fraction of that required by MRL. Our results establish sparse coding as a powerful paradigm for adaptive representation learning in real-world applications where efficiency and fidelity are both paramount. Code is available at this https URL

Comments:	A novel sparse coding framework designed for learning adaptive representation
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Cite as:	arXiv:2503.01776 [cs.LG]
	(or arXiv:2503.01776v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2503.01776

Submission history

From: Tiansheng Wen [view email]
[v1] Mon, 3 Mar 2025 17:59:48 UTC (861 KB)
[v2] Wed, 5 Mar 2025 17:51:09 UTC (896 KB)

Computer Science > Machine Learning

Title:Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators