SRVP: Strong Recollection Video Prediction Model Using Attention-Based Spatiotemporal Correlation Fusion

Kim, Yuseon; Park, Kyongseok

Computer Science > Computer Vision and Pattern Recognition

arXiv:2504.08012 (cs)

[Submitted on 10 Apr 2025 (v1), last revised 16 Apr 2025 (this version, v2)]

Title:SRVP: Strong Recollection Video Prediction Model Using Attention-Based Spatiotemporal Correlation Fusion

Authors:Yuseon Kim, Kyongseok Park

View PDF HTML (experimental)

Abstract:Video prediction (VP) generates future frames by leveraging spatial representations and temporal context from past frames. Traditional recurrent neural network (RNN)-based models enhance memory cell structures to capture spatiotemporal states over extended durations but suffer from gradual loss of object appearance details. To address this issue, we propose the strong recollection VP (SRVP) model, which integrates standard attention (SA) and reinforced feature attention (RFA) modules. Both modules employ scaled dot-product attention to extract temporal context and spatial correlations, which are then fused to enhance spatiotemporal representations. Experiments on three benchmark datasets demonstrate that SRVP mitigates image quality degradation in RNN-based models while achieving predictive performance comparable to RNN-free architectures.

Comments:	This paper has been accepted to CVPR 2025 Precognition Workshop
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2504.08012 [cs.CV]
	(or arXiv:2504.08012v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2504.08012

Submission history

From: Yuseon Kim [view email]
[v1] Thu, 10 Apr 2025 07:36:50 UTC (1,758 KB)
[v2] Wed, 16 Apr 2025 01:18:13 UTC (1,758 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SRVP: Strong Recollection Video Prediction Model Using Attention-Based Spatiotemporal Correlation Fusion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SRVP: Strong Recollection Video Prediction Model Using Attention-Based Spatiotemporal Correlation Fusion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators