MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus Infection

Cao, Xu; Ye, Wenqian; Moise, Kenny; Coffee, Megan

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2411.10888 (eess)

COVID-19 e-print

Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field.

[Submitted on 16 Nov 2024]

Title:MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus Infection

Authors:Xu Cao, Wenqian Ye, Kenny Moise, Megan Coffee

View PDF HTML (experimental)

Abstract:In the aftermath of the COVID-19 pandemic and amid accelerating climate change, emerging infectious diseases, particularly those arising from zoonotic spillover, remain a global threat. Mpox (caused by the monkeypox virus) is a notable example of a zoonotic infection that often goes undiagnosed, especially as its rash progresses through stages, complicating detection across diverse populations with different presentations. In August 2024, the WHO Director-General declared the mpox outbreak a public health emergency of international concern for a second time. Despite the deployment of deep learning techniques for detecting diseases from skin lesion images, a robust and publicly accessible foundation model for mpox diagnosis is still lacking due to the unavailability of open-source mpox skin lesion images, multimodal clinical data, and specialized training pipelines. To address this gap, we propose MpoxVLM, a vision-language model (VLM) designed to detect mpox by analyzing both skin lesion images and patient clinical information. MpoxVLM integrates the CLIP visual encoder, an enhanced Vision Transformer (ViT) classifier for skin lesions, and LLaMA-2-7B models, pre-trained and fine-tuned on visual instruction-following question-answer pairs from our newly released mpox skin lesion dataset. Our work achieves 90.38% accuracy for mpox detection, offering a promising pathway to improve early diagnostic accuracy in combating mpox.

Comments:	Accepted by ML4H 2024
Subjects:	Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.10888 [eess.IV]
	(or arXiv:2411.10888v1 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2411.10888

Submission history

From: Iroh (Xu) Cao [view email]
[v1] Sat, 16 Nov 2024 21:09:04 UTC (1,109 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus Infection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus Infection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators