HalluciNet-ing Spatiotemporal Representations Using 2D-CNN

Parmar, Paritosh; Morris, Brendan

Computer Science > Computer Vision and Pattern Recognition

arXiv:1912.04430v2 (cs)

[Submitted on 10 Dec 2019 (v1), revised 25 Mar 2020 (this version, v2), latest version 21 Oct 2020 (v3)]

Title:HalluciNet-ing Spatiotemporal Representations Using 2D-CNN

Authors:Paritosh Parmar, Brendan Morris

View PDF

Abstract:Spatiotemporal representations learnt using 3D convolutional neural networks (CNN) are currently the state-of-the-art approaches for action related tasks. However, 3D-CNN are notoriously known for being memory and compute resource intensive. 2D-CNN, on the other hand, are much lighter on computing resource requirements, and are faster. However, 2D-CNN performance on action related tasks is generally inferior to that of 3D-CNN. Taking inspiration from the fact that we, humans, can intuit how the actors will act and objects will be manipulated through years of experience and general understanding of the "how the world works," we suggest a way to combine the best attributes of 2D- and 3D-CNN -- we propose to hallucinate spatiotemporal representations as computed by 3D-CNN, using a 2D-CNN. We believe that requiring the 2D-CNN to "see" into the future, would encourage it to gain deeper understanding about actions, and how scenes evolve by providing a stronger supervisory signal. Hallucination task is treated rather as an auxiliary task, while the main task is any other action related task such as, action recognition. Thorough experimental evaluation shows that hallucination task indeed helps improve performance on action recognition, action quality assessment, and dynamic scene recognition. From practical standpoint, being able to hallucinate spatiotemporal representations without an actual 3D-CNN can enable deployment in resource-constrained scenarios such as limited compute power and/or with lower bandwidth. Codebase is available here: this https URL.

Comments:	Codebase: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1912.04430 [cs.CV]
	(or arXiv:1912.04430v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1912.04430

Submission history

From: Paritosh Parmar [view email]
[v1] Tue, 10 Dec 2019 00:44:25 UTC (1,358 KB)
[v2] Wed, 25 Mar 2020 04:33:52 UTC (1,524 KB)
[v3] Wed, 21 Oct 2020 07:05:24 UTC (1,691 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:HalluciNet-ing Spatiotemporal Representations Using 2D-CNN

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:HalluciNet-ing Spatiotemporal Representations Using 2D-CNN

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators