Learn from Foundation Model: Fruit Detection Model without Manual Annotation

Wang, Yanan; Fei, Zhenghao; Li, Ruichen; Ying, Yibin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.16196 (cs)

[Submitted on 25 Nov 2024]

Title:Learn from Foundation Model: Fruit Detection Model without Manual Annotation

Authors:Yanan Wang, Zhenghao Fei, Ruichen Li, Yibin Ying

View PDF HTML (experimental)

Abstract:Recent breakthroughs in large foundation models have enabled the possibility of transferring knowledge pre-trained on vast datasets to domains with limited data availability. Agriculture is one of the domains that lacks sufficient data. This study proposes a framework to train effective, domain-specific, small models from foundation models without manual annotation. Our approach begins with SDM (Segmentation-Description-Matching), a stage that leverages two foundation models: SAM2 (Segment Anything in Images and Videos) for segmentation and OpenCLIP (Open Contrastive Language-Image Pretraining) for zero-shot open-vocabulary classification. In the second stage, a novel knowledge distillation mechanism is utilized to distill compact, edge-deployable models from SDM, enhancing both inference speed and perception accuracy. The complete method, termed SDM-D (Segmentation-Description-Matching-Distilling), demonstrates strong performance across various fruit detection tasks object detection, semantic segmentation, and instance segmentation) without manual annotation. It nearly matches the performance of models trained with abundant labels. Notably, SDM-D outperforms open-set detection methods such as Grounding SAM and YOLO-World on all tested fruit detection datasets. Additionally, we introduce MegaFruits, a comprehensive fruit segmentation dataset encompassing over 25,000 images, and all code and datasets are made publicly available at this https URL.

Comments:	17 pages, 12 figures, conference or other essential info
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2411.16196 [cs.CV]
	(or arXiv:2411.16196v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.16196

Submission history

From: Yanan Wang [view email]
[v1] Mon, 25 Nov 2024 08:52:46 UTC (22,536 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Learn from Foundation Model: Fruit Detection Model without Manual Annotation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Learn from Foundation Model: Fruit Detection Model without Manual Annotation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators