RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment

Gu, Difei; Gao, Yunhe; Zhou, Yang; Zhou, Mu; Metaxas, Dimitris

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.07525 (cs)

[Submitted on 13 Jan 2025]

Title:RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment

Authors:Difei Gu, Yunhe Gao, Yang Zhou, Mu Zhou, Dimitris Metaxas

View PDF HTML (experimental)

Abstract:Automated chest radiographs interpretation requires both accurate disease classification and detailed radiology report generation, presenting a significant challenge in the clinical workflow. Current approaches either focus on classification accuracy at the expense of interpretability or generate detailed but potentially unreliable reports through image captioning techniques. In this study, we present RadAlign, a novel framework that combines the predictive accuracy of vision-language models (VLMs) with the reasoning capabilities of large language models (LLMs). Inspired by the radiologist's workflow, RadAlign first employs a specialized VLM to align visual features with key medical concepts, achieving superior disease classification with an average AUC of 0.885 across multiple diseases. These recognized medical conditions, represented as text-based concepts in the aligned visual-language space, are then used to prompt LLM-based report generation. Enhanced by a retrieval-augmented generation mechanism that grounds outputs in similar historical cases, RadAlign delivers superior report quality with a GREEN score of 0.678, outperforming state-of-the-art methods' 0.634. Our framework maintains strong clinical interpretability while reducing hallucinations, advancing automated medical imaging and report analysis through integrated predictive and generative AI. Code is available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2501.07525 [cs.CV]
	(or arXiv:2501.07525v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.07525

Submission history

From: Difei Gu [view email]
[v1] Mon, 13 Jan 2025 17:55:32 UTC (11,217 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators