R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation

Wang, Xiao; Li, Yuehang; Wang, Fuling; Wang, Shiao; Li, Chuanfu; Jiang, Bo

Computer Science > Computer Vision and Pattern Recognition

arXiv:2408.09743 (cs)

[Submitted on 19 Aug 2024]

Title:R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation

Authors:Xiao Wang, Yuehang Li, Fuling Wang, Shiao Wang, Chuanfu Li, Bo Jiang

View PDF HTML (experimental)

Abstract:Inspired by the tremendous success of Large Language Models (LLMs), existing X-ray medical report generation methods attempt to leverage large models to achieve better performance. They usually adopt a Transformer to extract the visual features of a given X-ray image, and then, feed them into the LLM for text generation. How to extract more effective information for the LLMs to help them improve final results is an urgent problem that needs to be solved. Additionally, the use of visual Transformer models also brings high computational complexity. To address these issues, this paper proposes a novel context-guided efficient X-ray medical report generation framework. Specifically, we introduce the Mamba as the vision backbone with linear complexity, and the performance obtained is comparable to that of the strong Transformer model. More importantly, we perform context retrieval from the training set for samples within each mini-batch during the training phase, utilizing both positively and negatively related samples to enhance feature representation and discriminative learning. Subsequently, we feed the vision tokens, context information, and prompt statements to invoke the LLM for generating high-quality medical reports. Extensive experiments on three X-ray report generation datasets (i.e., IU-Xray, MIMIC-CXR, CheXpert Plus) fully validated the effectiveness of our proposed model. The source code of this work will be released on \url{this https URL}.

Comments:	In Peer Review
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2408.09743 [cs.CV]
	(or arXiv:2408.09743v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2408.09743

Submission history

From: Xiao Wang [view email]
[v1] Mon, 19 Aug 2024 07:15:11 UTC (4,669 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators