From Markov to Laplace: How Mamba In-Context Learns Markov Chains

Bondaschi, Marco; Rajaraman, Nived; Wei, Xiuying; Ramchandran, Kannan; Pascanu, Razvan; Gulcehre, Caglar; Gastpar, Michael; Makkuva, Ashok Vardhan

Computer Science > Machine Learning

arXiv:2502.10178 (cs)

[Submitted on 14 Feb 2025]

Title:From Markov to Laplace: How Mamba In-Context Learns Markov Chains

Authors:Marco Bondaschi, Nived Rajaraman, Xiuying Wei, Kannan Ramchandran, Razvan Pascanu, Caglar Gulcehre, Michael Gastpar, Ashok Vardhan Makkuva

View PDF HTML (experimental)

Abstract:While transformer-based language models have driven the AI revolution thus far, their computational complexity has spurred growing interest in viable alternatives, such as structured state space sequence models (SSMs) and Selective SSMs. Among these, Mamba (S6) and its variant Mamba-2 have shown remarkable inference speed ups over transformers while achieving comparable or superior performance on complex language modeling tasks. However, despite these architectural innovations and empirical successes, the fundamental learning capabilities of Mamba remain poorly understood. In this paper, we address this gap by studying in-context learning (ICL) on Markov chains and uncovering a surprising phenomenon: unlike transformers, even a single-layer Mamba efficiently learns the in-context Laplacian smoothing estimator, which is both Bayes and minimax optimal, for all Markovian orders. To explain this, we theoretically characterize the representation capacity of Mamba and reveal the fundamental role of convolution in enabling it to represent the optimal Laplacian smoothing. These theoretical insights align strongly with empirical results and, to the best of our knowledge, represent the first formal connection between Mamba and optimal statistical estimators. Finally, we outline promising research directions inspired by these findings.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Cite as:	arXiv:2502.10178 [cs.LG]
	(or arXiv:2502.10178v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.10178

Submission history

From: Marco Bondaschi [view email]
[v1] Fri, 14 Feb 2025 14:13:55 UTC (133 KB)

Computer Science > Machine Learning

Title:From Markov to Laplace: How Mamba In-Context Learns Markov Chains

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:From Markov to Laplace: How Mamba In-Context Learns Markov Chains

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators