LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model

Wang, Dongkai; Xuan, Shiyu; Zhang, Shiliang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.04659 (cs)

[Submitted on 7 Jun 2024]

Title:LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model

Authors:Dongkai Wang, Shiyu Xuan, Shiliang Zhang

View PDF HTML (experimental)

Abstract:The capacity of existing human keypoint localization models is limited by keypoint priors provided by the training data. To alleviate this restriction and pursue more general model, this work studies keypoint localization from a different perspective by reasoning locations based on keypiont clues in text descriptions. We propose LocLLM, the first Large-Language Model (LLM) based keypoint localization model that takes images and text instructions as inputs and outputs the desired keypoint coordinates. LocLLM leverages the strong reasoning capability of LLM and clues of keypoint type, location, and relationship in textual descriptions for keypoint localization. To effectively tune LocLLM, we construct localization-based instruction conversations to connect keypoint description with corresponding coordinates in input image, and fine-tune the whole model in a parameter-efficient training pipeline. LocLLM shows remarkable performance on standard 2D/3D keypoint localization benchmarks. Moreover, incorporating language clues into the localization makes LocLLM show superior flexibility and generalizable capability in cross dataset keypoint localization, and even detecting novel type of keypoints unseen during training.

Comments:	CVPR2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2406.04659 [cs.CV]
	(or arXiv:2406.04659v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.04659

Submission history

From: Dongkai Wang [view email]
[v1] Fri, 7 Jun 2024 05:58:35 UTC (640 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators