Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving

Gao, Haoxiang; Zhao, Yu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.06680 (cs)

[Submitted on 12 Jan 2025]

Title:Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving

Authors:Haoxiang Gao, Yu Zhao

View PDF HTML (experimental)

Abstract:Autonomous driving (AD) has experienced significant improvements in recent years and achieved promising 3D detection, classification, and localization results. However, many challenges remain, e.g. semantic understanding of pedestrians' behaviors, and downstream handling for pedestrian interactions. Recent studies in applications of Large Language Models (LLM) and Vision-Language Models (VLM) have achieved promising results in scene understanding and high-level maneuver planning in diverse traffic scenarios. However, deploying the billion-parameter LLMs to vehicles requires significant computation and memory resources. In this paper, we analyzed effective knowledge distillation of semantic labels to smaller Vision networks, which can be used for the semantic representation of complex scenes for downstream decision-making for planning and control.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Cite as:	arXiv:2501.06680 [cs.CV]
	(or arXiv:2501.06680v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.06680

Submission history

From: Haoxiang Gao [view email]
[v1] Sun, 12 Jan 2025 01:31:07 UTC (2,731 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2025-01

Change to browse by:

cs
cs.AI
cs.LG
cs.RO

References & Citations

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators