STGFormer: Spatio-Temporal GraphFormer for 3D Human Pose Estimation in Video

Liu, Yang; Zhang, Zhiyong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.10099 (cs)

[Submitted on 14 Jul 2024]

Title:STGFormer: Spatio-Temporal GraphFormer for 3D Human Pose Estimation in Video

Authors:Yang Liu, Zhiyong Zhang

View PDF HTML (experimental)

Abstract:The current methods of video-based 3D human pose estimation have achieved significant progress; however, they continue to confront the significant challenge of depth ambiguity. To address this limitation, this paper presents the spatio-temporal GraphFormer framework for 3D human pose estimation in video, which integrates body structure graph-based representations with spatio-temporal information. Specifically, we develop a spatio-temporal criss-cross graph (STG) attention mechanism. This approach is designed to learn the long-range dependencies in data across both time and space, integrating graph information directly into the respective attention layers. Furthermore, we introduce the dual-path modulated hop-wise regular GCN (MHR-GCN) module, which utilizes modulation to optimize parameter usage and employs spatio-temporal hop-wise skip connections to acquire higher-order information. Additionally, this module processes temporal and spatial dimensions independently to learn their respective features while avoiding mutual influence. Finally, we demonstrate that our method achieves state-of-the-art performance in 3D human pose estimation on the Human3.6M and MPI-INF-3DHP datasets.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.10099 [cs.CV]
	(or arXiv:2407.10099v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.10099

Submission history

From: Yang Liu [view email]
[v1] Sun, 14 Jul 2024 06:45:27 UTC (2,627 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:STGFormer: Spatio-Temporal GraphFormer for 3D Human Pose Estimation in Video

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:STGFormer: Spatio-Temporal GraphFormer for 3D Human Pose Estimation in Video

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators