D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions

Nisar, Hareem; Anwar, Syed Muhammad; Jiang, Zhifan; Parida, Abhijeet; Sanchez-Jacob, Ramon; Nath, Vishwesh; Roth, Holger R.; Linguraru, Marius George

Computer Science > Artificial Intelligence

arXiv:2407.02604 (cs)

[Submitted on 2 Jul 2024 (v1), last revised 2 Aug 2024 (this version, v2)]

Title:D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions

Authors:Hareem Nisar, Syed Muhammad Anwar, Zhifan Jiang, Abhijeet Parida, Ramon Sanchez-Jacob, Vishwesh Nath, Holger R. Roth, Marius George Linguraru

View PDF HTML (experimental)

Abstract:Large vision language models (VLMs) have progressed incredibly from research to applicability for general-purpose use cases. LLaVA-Med, a pioneering large language and vision assistant for biomedicine, can perform multi-modal biomedical image and data analysis to provide a natural language interface for radiologists. While it is highly generalizable and works with multi-modal data, it is currently limited by well-known challenges that exist in the large language model space. Hallucinations and imprecision in responses can lead to misdiagnosis which currently hinder the clinical adaptability of VLMs. To create precise, user-friendly models in healthcare, we propose D-Rax -- a domain-specific, conversational, radiologic assistance tool that can be used to gain insights about a particular radiologic image. In this study, we enhance the conversational analysis of chest X-ray (CXR) images to support radiological reporting, offering comprehensive insights from medical imaging and aiding in the formulation of accurate diagnosis. D-Rax is achieved by fine-tuning the LLaVA-Med architecture on our curated enhanced instruction-following data, comprising of images, instructions, as well as disease diagnosis and demographic predictions derived from MIMIC-CXR imaging data, CXR-related visual question answer (VQA) pairs, and predictive outcomes from multiple expert AI models. We observe statistically significant improvement in responses when evaluated for both open and close-ended conversations. Leveraging the power of state-of-the-art diagnostic models combined with VLMs, D-Rax empowers clinicians to interact with medical images using natural language, which could potentially streamline their decision-making process, enhance diagnostic accuracy, and conserve their time.

Comments:	accepted to the MICCAI 2024 Second International Workshop on Foundation Models for General Medical AI
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Cite as:	arXiv:2407.02604 [cs.AI]
	(or arXiv:2407.02604v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2407.02604

Submission history

From: Abhijeet Parida [view email]
[v1] Tue, 2 Jul 2024 18:43:10 UTC (12,392 KB)
[v2] Fri, 2 Aug 2024 13:45:53 UTC (12,392 KB)

Computer Science > Artificial Intelligence

Title:D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators