Onboard Satellite Image Classification for Earth Observation: A Comparative Study of ViT Models

Le, Thanh-Dung; Ha, Vu Nguyen; Nguyen, Ti Ti; Eappen, Geoffrey; Thiruvasagam, Prabhu; Garces-Socarras, Luis M.; Chou, Hong-fu; Gonzalez-Rios, Jorge L.; Merlano-Duncan, Juan Carlos; Chatzinotas, Symeon

Computer Science > Computer Vision and Pattern Recognition

arXiv:2409.03901 (cs)

[Submitted on 5 Sep 2024 (v1), last revised 21 Oct 2024 (this version, v2)]

Title:Onboard Satellite Image Classification for Earth Observation: A Comparative Study of ViT Models

Authors:Thanh-Dung Le, Vu Nguyen Ha, Ti Ti Nguyen, Geoffrey Eappen, Prabhu Thiruvasagam, Luis M. Garces-Socarras, Hong-fu Chou, Jorge L. Gonzalez-Rios, Juan Carlos Merlano-Duncan, Symeon Chatzinotas

View PDF HTML (experimental)

Abstract:This study focuses on identifying the most effective pre-trained model for land use classification in onboard satellite processing, emphasizing achieving high accuracy, computational efficiency, and robustness against noisy data conditions commonly encountered during satellite-based inference. Through extensive experimentation, we compare the performance of traditional CNN-based, ResNet-based, and various pre-trained vision Transformer models. Our findings demonstrate that pre-trained Vision Transformer (ViT) models, particularly MobileViTV2 and EfficientViT-M2, outperform models trained from scratch in terms of accuracy and efficiency. These models achieve high performance with reduced computational requirements and exhibit greater resilience during inference under noisy conditions. While MobileViTV2 has excelled on clean validation data, EfficientViT-M2 has proved more robust when handling noise, making it the most suitable model for onboard satellite EO tasks. Our experimental results demonstrate that EfficientViT-M2 is the optimal choice for reliable and efficient RS-IC in satellite operations, achieving 98.76 % of accuracy, precision, and recall. Precisely, EfficientViT-M2 delivers the highest performance across all metrics, excels in training efficiency (1,000s) and inference time (10s), and demonstrates greater robustness (overall robustness score of 0.79). Consequently, EfficientViT-M2 consumes 63.93 % less power than MobileViTV2 (79.23 W) and 73.26 % less power than SwinTransformer (108.90 W). This highlights its significant advantage in energy efficiency.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
Cite as:	arXiv:2409.03901 [cs.CV]
	(or arXiv:2409.03901v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2409.03901

Submission history

From: Thanh-Dung Le [view email]
[v1] Thu, 5 Sep 2024 20:21:49 UTC (7,612 KB)
[v2] Mon, 21 Oct 2024 23:15:51 UTC (7,396 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Onboard Satellite Image Classification for Earth Observation: A Comparative Study of ViT Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Onboard Satellite Image Classification for Earth Observation: A Comparative Study of ViT Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators