Towards a Knowledge guided Multimodal Foundation Model for Spatio-Temporal Remote Sensing Applications

Ravirathinam, Praveen; Khandelwal, Ankush; Ghosh, Rahul; Kumar, Vipin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.19660 (cs)

[Submitted on 29 Jul 2024]

Title:Towards a Knowledge guided Multimodal Foundation Model for Spatio-Temporal Remote Sensing Applications

Authors:Praveen Ravirathinam, Ankush Khandelwal, Rahul Ghosh, Vipin Kumar

View PDF HTML (experimental)

Abstract:In recent years, there is increased interest in foundation models for geoscience due to vast amount of earth observing satellite imagery. Existing remote sensing foundation models make use of the various sources of spectral imagery to create large models pretrained on masked reconstruction task. The embeddings from these foundation models are then used for various downstream remote sensing applications. In this paper we propose a foundational modeling framework for remote sensing geoscience applications, that goes beyond these traditional single modality masked autoencoder family of foundation models. This framework leverages the knowledge guided principles that the spectral imagery captures the impact of the physical drivers on the environmental system, and that the relationship between them is governed by the characteristics of the system. Specifically, our method, called MultiModal Variable Step Forecasting (MM-VSF), uses mutlimodal data (spectral imagery and weather) as its input and a variable step forecasting task as its pretraining objective. In our evaluation we show forecasting of satellite imagery using weather can be used as an effective pretraining task for foundation models. We further show the effectiveness of the embeddings from MM-VSF on the downstream task of pixel wise crop mapping, when compared with a model trained in the traditional setting of single modality input and masked reconstruction based pretraining.

Comments:	9 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2407.19660 [cs.CV]
	(or arXiv:2407.19660v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.19660

Submission history

From: Praveen Ravirathinam [view email]
[v1] Mon, 29 Jul 2024 02:49:55 UTC (1,399 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Towards a Knowledge guided Multimodal Foundation Model for Spatio-Temporal Remote Sensing Applications

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Towards a Knowledge guided Multimodal Foundation Model for Spatio-Temporal Remote Sensing Applications

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators