Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces

Payandeh, Amirreza; Song, Daeun; Nazeri, Mohammad; Liang, Jing; Mukherjee, Praneel; Raj, Amir Hossain; Kong, Yangzhe; Manocha, Dinesh; Xiao, Xuesu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.09024 (cs)

[Submitted on 30 Dec 2024]

Title:Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces

Authors:Amirreza Payandeh, Daeun Song, Mohammad Nazeri, Jing Liang, Praneel Mukherjee, Amir Hossain Raj, Yangzhe Kong, Dinesh Manocha, Xuesu Xiao

View PDF HTML (experimental)

Abstract:Most existing social robot navigation techniques either leverage hand-crafted rules or human demonstrations to connect robot perception to socially compliant actions. However, there remains a significant gap in effectively translating perception into socially compliant actions, much like how human reasoning naturally occurs in dynamic environments. Considering the recent success of Vision-Language Models (VLMs), we propose using language to bridge the gap in human-like reasoning between perception and socially aware robot actions. We create a vision-language dataset, Social robot Navigation via Explainable Interactions (SNEI), featuring 40K human-annotated Visual Question Answers (VQAs) based on 2K human-robot social interactions in unstructured, crowded public spaces, spanning perception, prediction, chain-of-thought reasoning, action, and explanation. We fine-tune a VLM, Social-LLaVA, using SNEI to demonstrate the practical application of our dataset. Social-LLaVA outperforms state-of-the-art models like GPT-4V and Gemini, based on the average of fifteen different human-judge scores across 50 VQA. Deployed onboard a mobile robot, Social-LLaVA enables human-like reasoning, marking a promising step toward socially compliant robot navigation in dynamic public spaces through language reasoning.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Robotics (cs.RO)
Cite as:	arXiv:2501.09024 [cs.CV]
	(or arXiv:2501.09024v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.09024

Submission history

From: Amirreza Payandeh [view email]
[v1] Mon, 30 Dec 2024 23:59:30 UTC (2,692 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators