Bottom-up and top-down reasoning with convolutional latent-variable models

Hu, Peiyun; Ramanan, Deva

Computer Science > Computer Vision and Pattern Recognition

arXiv:1507.05699v2 (cs)

[Submitted on 21 Jul 2015 (v1), revised 22 Jul 2015 (this version, v2), latest version 4 May 2016 (v5)]

Title:Bottom-up and top-down reasoning with convolutional latent-variable models

Authors:Peiyun Hu, Deva Ramanan

View PDF

Abstract:Convolutional neural nets (CNNs) have demonstrated remarkable performance in recent history. Such approaches tend to work in a "unidirectional" bottom-up feed-forward fashion. However, biological evidence suggests that feedback plays a crucial role, particularly for detailed spatial understanding tasks. This work introduces "bidirectional" architectures that also reason with top-down feedback: neural units are influenced by both lower and higher-level units. We do so by treating units as latent variables in a global energy function. We call our models convolutional latent-variable models (CLVMs). From a theoretical perspective, CLVMs unify several approaches for recognition, including CNNs, generative deep models (e.g., Boltzmann machines), and discriminative latent-variable models (e.g., DPMs).
From a practical perspective, CLVMs are particularly well-suited for multi-task learning. We describe a single architecture that simultaneously achieves state-of-the-art accuracy for tasks spanning both high-level recognition (part detection/localization) and low-level grouping (pixel segmentation). Bidirectional reasoning is particularly helpful for detailed low-level tasks, since they can take advantage of top-down feedback. Our architectures are quite efficient, capable of processing an image in milliseconds. We present results on benchmark datasets with both part/keypoint labels and segmentation masks (such as PASCAL and LFW) that demonstrate a significant improvement over prior art, in both speed and accuracy.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1507.05699 [cs.CV]
	(or arXiv:1507.05699v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1507.05699

Submission history

From: Peiyun Hu [view email]
[v1] Tue, 21 Jul 2015 04:00:44 UTC (3,928 KB)
[v2] Wed, 22 Jul 2015 05:18:01 UTC (4,515 KB)
[v3] Sun, 2 Aug 2015 04:44:09 UTC (5,132 KB)
[v4] Tue, 3 May 2016 05:50:39 UTC (8,175 KB)
[v5] Wed, 4 May 2016 22:50:35 UTC (8,510 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Bottom-up and top-down reasoning with convolutional latent-variable models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Bottom-up and top-down reasoning with convolutional latent-variable models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators