Human Gaze Boosts Object-Centered Representation Learning

Schaumlöffel, Timothy; Aubret, Arthur; Roig, Gemma; Triesch, Jochen

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.02966 (cs)

[Submitted on 6 Jan 2025]

Title:Human Gaze Boosts Object-Centered Representation Learning

Authors:Timothy Schaumlöffel, Arthur Aubret, Gemma Roig, Jochen Triesch

View PDF HTML (experimental)

Abstract:Recent self-supervised learning (SSL) models trained on human-like egocentric visual inputs substantially underperform on image recognition tasks compared to humans. These models train on raw, uniform visual inputs collected from head-mounted cameras. This is different from humans, as the anatomical structure of the retina and visual cortex relatively amplifies the central visual information, i.e. around humans' gaze location. This selective amplification in humans likely aids in forming object-centered visual representations. Here, we investigate whether focusing on central visual information boosts egocentric visual object learning. We simulate 5-months of egocentric visual experience using the large-scale Ego4D dataset and generate gaze locations with a human gaze prediction model. To account for the importance of central vision in humans, we crop the visual area around the gaze location. Finally, we train a time-based SSL model on these modified inputs. Our experiments demonstrate that focusing on central vision leads to better object-centered representations. Our analysis shows that the SSL model leverages the temporal dynamics of the gaze movements to build stronger visual representations. Overall, our work marks a significant step toward bio-inspired learning of visual representations.

Comments:	13 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2501.02966 [cs.CV]
	(or arXiv:2501.02966v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.02966

Submission history

From: Arthur Aubret [view email]
[v1] Mon, 6 Jan 2025 12:21:40 UTC (7,059 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Human Gaze Boosts Object-Centered Representation Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Human Gaze Boosts Object-Centered Representation Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators