GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Qi, Zhangyang; Fang, Ye; Sun, Zeyi; Wu, Xiaoyang; Wu, Tong; Wang, Jiaqi; Lin, Dahua; Zhao, Hengshuang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.02980 (cs)

[Submitted on 5 Dec 2023]

Title:GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Authors:Zhangyang Qi, Ye Fang, Zeyi Sun, Xiaoyang Wu, Tong Wu, Jiaqi Wang, Dahua Lin, Hengshuang Zhao

View PDF HTML (experimental)

Abstract:Multimodal Large Language Models (MLLMs) have excelled in 2D image-text comprehension and image generation, but their understanding of the 3D world is notably deficient, limiting progress in 3D language understanding and generation. To solve this problem, we introduce GPT4Point, an innovative groundbreaking point-language multimodal model designed specifically for unified 3D object understanding and generation within the MLLM framework. GPT4Point as a powerful 3D MLLM seamlessly can execute a variety of point-text reference tasks such as point-cloud captioning and Q&A. Additionally, GPT4Point is equipped with advanced capabilities for controllable 3D generation, it can get high-quality results through a low-quality point-text feature maintaining the geometric shapes and colors. To support the expansive needs of 3D object-text pairs, we develop Pyramid-XL, a point-language dataset annotation engine. It constructs a large-scale database over 1M objects of varied text granularity levels from the Objaverse-XL dataset, essential for training GPT4Point. A comprehensive benchmark has been proposed to evaluate 3D point-language understanding capabilities. In extensive evaluations, GPT4Point has demonstrated superior performance in understanding and generation.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.02980 [cs.CV]
	(or arXiv:2312.02980v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.02980

Submission history

From: Zhangyang Qi [view email]
[v1] Tue, 5 Dec 2023 18:59:55 UTC (7,889 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators