Learning Hierarchical Semantic Classification by Grounding on Consistent Image Segmentations

Park, Seulki; Zhang, Youren; Yu, Stella X.; Beery, Sara; Huang, Jonathan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.11608 (cs)

[Submitted on 17 Jun 2024]

Title:Learning Hierarchical Semantic Classification by Grounding on Consistent Image Segmentations

Authors:Seulki Park, Youren Zhang, Stella X. Yu, Sara Beery, Jonathan Huang

View PDF HTML (experimental)

Abstract:Hierarchical semantic classification requires the prediction of a taxonomy tree instead of a single flat level of the tree, where both accuracies at individual levels and consistency across levels matter. We can train classifiers for individual levels, which has accuracy but not consistency, or we can train only the finest level classification and infer higher levels, which has consistency but not accuracy. Our key insight is that hierarchical recognition should not be treated as multi-task classification, as each level is essentially a different task and they would have to compromise with each other, but be grounded on image segmentations that are consistent across semantic granularities. Consistency can in fact improve accuracy. We build upon recent work on learning hierarchical segmentation for flat-level recognition, and extend it to hierarchical recognition. It naturally captures the intuition that fine-grained recognition requires fine image segmentation whereas coarse-grained recognition requires coarse segmentation; they can all be integrated into one recognition model that drives fine-to-coarse internal visual parsing.Additionally, we introduce a Tree-path KL Divergence loss to enforce consistent accurate predictions across levels. Our extensive experimentation and analysis demonstrate our significant gains on predicting an accurate and consistent taxonomy tree.

Comments:	34 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2406.11608 [cs.CV]
	(or arXiv:2406.11608v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.11608

Submission history

From: Seulki Park [view email]
[v1] Mon, 17 Jun 2024 14:56:51 UTC (9,002 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Learning Hierarchical Semantic Classification by Grounding on Consistent Image Segmentations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Learning Hierarchical Semantic Classification by Grounding on Consistent Image Segmentations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators