Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding

Torimi, Kohei; Yamada, Ryosuke; Otsuka, Daichi; Hara, Kensho; Asano, Yuki M.; Kataoka, Hirokatsu; Aoki, Yoshimitsu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.09278 (cs)

[Submitted on 16 Jan 2025]

Title:Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding

Authors:Kohei Torimi, Ryosuke Yamada, Daichi Otsuka, Kensho Hara, Yuki M. Asano, Hirokatsu Kataoka, Yoshimitsu Aoki

View PDF HTML (experimental)

Abstract:Zero-shot recognition models require extensive training data for generalization. However, in zero-shot 3D classification, collecting 3D data and captions is costly and laborintensive, posing a significant barrier compared to 2D vision. Recent advances in generative models have achieved unprecedented realism in synthetic data production, and recent research shows the potential for using generated data as training data. Here, naturally raising the question: Can synthetic 3D data generated by generative models be used as expanding limited 3D datasets? In response, we present a synthetic 3D dataset expansion method, Textguided Geometric Augmentation (TeGA). TeGA is tailored for language-image-3D pretraining, which achieves SoTA in zero-shot 3D classification, and uses a generative textto-3D model to enhance and extend limited 3D datasets. Specifically, we automatically generate text-guided synthetic 3D data and introduce a consistency filtering strategy to discard noisy samples where semantics and geometric shapes do not match with text. In the experiment to double the original dataset size using TeGA, our approach demonstrates improvements over the baselines, achieving zeroshot performance gains of 3.0% on Objaverse-LVIS, 4.6% on ScanObjectNN, and 8.7% on ModelNet40. These results demonstrate that TeGA effectively bridges the 3D data gap, enabling robust zero-shot 3D classification even with limited real training data and paving the way for zero-shot 3D vision application.

Comments:	14 pages, 8 figures, this paper is submitted to CVPR
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.09278 [cs.CV]
	(or arXiv:2501.09278v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.09278

Submission history

From: Kohei Torimi [view email]
[v1] Thu, 16 Jan 2025 03:54:06 UTC (21,998 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators