Computer Science > Computer Vision and Pattern Recognition
[Submitted on 10 Dec 2019 (v1), revised 25 Mar 2020 (this version, v2), latest version 21 Oct 2020 (v3)]
Title:HalluciNet-ing Spatiotemporal Representations Using 2D-CNN
View PDFAbstract:Spatiotemporal representations learnt using 3D convolutional neural networks (CNN) are currently the state-of-the-art approaches for action related tasks. However, 3D-CNN are notoriously known for being memory and compute resource intensive. 2D-CNN, on the other hand, are much lighter on computing resource requirements, and are faster. However, 2D-CNN performance on action related tasks is generally inferior to that of 3D-CNN. Taking inspiration from the fact that we, humans, can intuit how the actors will act and objects will be manipulated through years of experience and general understanding of the "how the world works," we suggest a way to combine the best attributes of 2D- and 3D-CNN -- we propose to hallucinate spatiotemporal representations as computed by 3D-CNN, using a 2D-CNN. We believe that requiring the 2D-CNN to "see" into the future, would encourage it to gain deeper understanding about actions, and how scenes evolve by providing a stronger supervisory signal. Hallucination task is treated rather as an auxiliary task, while the main task is any other action related task such as, action recognition. Thorough experimental evaluation shows that hallucination task indeed helps improve performance on action recognition, action quality assessment, and dynamic scene recognition. From practical standpoint, being able to hallucinate spatiotemporal representations without an actual 3D-CNN can enable deployment in resource-constrained scenarios such as limited compute power and/or with lower bandwidth. Codebase is available here: this https URL.
Submission history
From: Paritosh Parmar [view email][v1] Tue, 10 Dec 2019 00:44:25 UTC (1,358 KB)
[v2] Wed, 25 Mar 2020 04:33:52 UTC (1,524 KB)
[v3] Wed, 21 Oct 2020 07:05:24 UTC (1,691 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.