RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models

Zhu, Wenhui; Li, Xin; Chen, Xiwen; Qiu, Peijie; Vasa, Vamsi Krishna; Dong, Xuanzhao; Chen, Yanxi; Lepore, Natasha; Dumitrascu, Oana; Su, Yi; Wang, Yalin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.03987 (cs)

[Submitted on 6 Mar 2025]

Title:RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models

Authors:Wenhui Zhu, Xin Li, Xiwen Chen, Peijie Qiu, Vamsi Krishna Vasa, Xuanzhao Dong, Yanxi Chen, Natasha Lepore, Oana Dumitrascu, Yi Su, Yalin Wang

View PDF HTML (experimental)

Abstract:Recently, Multimodal Large Language Models (MLLMs) have gained significant attention for their remarkable ability to process and analyze non-textual data, such as images, videos, and audio. Notably, several adaptations of general-domain MLLMs to the medical field have been explored, including LLaVA-Med. However, these medical adaptations remain insufficiently advanced in understanding and interpreting retinal images. In contrast, medical experts emphasize the importance of quantitative analyses for disease detection and interpretation. This underscores a gap between general-domain and medical-domain MLLMs: while general-domain MLLMs excel in broad applications, they lack the specialized knowledge necessary for precise diagnostic and interpretative tasks in the medical field. To address these challenges, we introduce \textit{RetinalGPT}, a multimodal conversational assistant for clinically preferred quantitative analysis of retinal images. Specifically, we achieve this by compiling a large retinal image dataset, developing a novel data pipeline, and employing customized visual instruction tuning to enhance both retinal analysis and enrich medical knowledge. In particular, RetinalGPT outperforms MLLM in the generic domain by a large margin in the diagnosis of retinal diseases in 8 benchmark retinal datasets. Beyond disease diagnosis, RetinalGPT features quantitative analyses and lesion localization, representing a pioneering step in leveraging LLMs for an interpretable and end-to-end clinical research framework. The code is available at this https URL

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2503.03987 [cs.CV]
	(or arXiv:2503.03987v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.03987

Submission history

From: Wenhui Zhu [view email]
[v1] Thu, 6 Mar 2025 00:19:54 UTC (27,148 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators