MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation

Zatsarynna, Olga; Bahrami, Emad; Farha, Yazan Abu; Francesca, Gianpiero; Gall, Juergen

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.08837 (cs)

[Submitted on 15 Jan 2025]

Title:MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation

Authors:Olga Zatsarynna, Emad Bahrami, Yazan Abu Farha, Gianpiero Francesca, Juergen Gall

View PDF HTML (experimental)

Abstract:Our work addresses the problem of stochastic long-term dense anticipation. The goal of this task is to predict actions and their durations several minutes into the future based on provided video observations. Anticipation over extended horizons introduces high uncertainty, as a single observation can lead to multiple plausible future outcomes. To address this uncertainty, stochastic models are designed to predict several potential future action sequences. Recent work has further proposed to incorporate uncertainty modelling for observed frames by simultaneously predicting per-frame past and future actions in a unified manner. While such joint modelling of actions is beneficial, it requires long-range temporal capabilities to connect events across distant past and future time points. However, the previous work struggles to achieve such a long-range understanding due to its limited and/or sparse receptive field. To alleviate this issue, we propose a novel MANTA (MAmba for ANTicipation) network. Our model enables effective long-term temporal modelling even for very long sequences while maintaining linear complexity in sequence length. We demonstrate that our approach achieves state-of-the-art results on three datasets - Breakfast, 50Salads, and Assembly101 - while also significantly improving computational and memory efficiency.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.08837 [cs.CV]
	(or arXiv:2501.08837v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.08837

Submission history

From: Olga Zatsarynna [view email]
[v1] Wed, 15 Jan 2025 14:46:44 UTC (3,475 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators