Appearance Fusion of Multiple Cues for Video Co-localization

Jerripothula, Koteswar Rao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2003.09556v1 (cs)

[Submitted on 21 Mar 2020 (this version), latest version 18 Jul 2020 (v2)]

Title:Appearance Fusion of Multiple Cues for Video Co-localization

Authors:Koteswar Rao Jerripothula

View PDF

Abstract:This work addresses a problem named video co-localization that aims at localizing the objects in videos jointly. Although there are numerous cues available for this purpose, for example, saliency, motion, and joint, their robust fusion can be quite challenging at times due to their spatial inconsistencies. To overcome this, in this paper, we propose a novel appearance fusion method where we fuse appearance models derived from these cues rather than spatially fusing their maps. In this method, we evaluate the cues in terms of their reliability and consensus to guide the appearance fusion process. We also develop a novel joint cue relying on topological hierarchy. We utilize the final fusion results to produce a few candidate bounding boxes and for subsequent optimal selection among them while considering the spatiotemporal constraints. The proposed method achieves promising results on the YouTube Objects dataset.

Comments:	9 Pages and 9 figures. Submitted to IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Cite as:	arXiv:2003.09556 [cs.CV]
	(or arXiv:2003.09556v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2003.09556

Submission history

From: Koteswar Rao Jerripothula [view email]
[v1] Sat, 21 Mar 2020 02:26:36 UTC (1,137 KB)
[v2] Sat, 18 Jul 2020 04:53:03 UTC (2,645 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Appearance Fusion of Multiple Cues for Video Co-localization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Appearance Fusion of Multiple Cues for Video Co-localization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators