Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation

Wang, Xiao; Wang, Fuling; Wang, Haowen; Jiang, Bo; Li, Chuanfu; Wang, Yaowei; Tian, Yonghong; Tang, Jin

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2501.03458 (eess)

[Submitted on 7 Jan 2025]

Title:Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation

Authors:Xiao Wang, Fuling Wang, Haowen Wang, Bo Jiang, Chuanfu Li, Yaowei Wang, Yonghong Tian, Jin Tang

View PDF HTML (experimental)

Abstract:X-ray image based medical report generation achieves significant progress in recent years with the help of the large language model, however, these models have not fully exploited the effective information in visual image regions, resulting in reports that are linguistically sound but insufficient in describing key diseases. In this paper, we propose a novel associative memory-enhanced X-ray report generation model that effectively mimics the process of professional doctors writing medical reports. It considers both the mining of global and local visual information and associates historical report information to better complete the writing of the current report. Specifically, given an X-ray image, we first utilize a classification model along with its activation maps to accomplish the mining of visual regions highly associated with diseases and the learning of disease query tokens. Then, we employ a visual Hopfield network to establish memory associations for disease-related tokens, and a report Hopfield network to retrieve report memory information. This process facilitates the generation of high-quality reports based on a large language model and achieves state-of-the-art performance on multiple benchmark datasets, including the IU X-ray, MIMIC-CXR, and Chexpert Plus. The source code of this work is released on \url{this https URL}.

Comments:	In Peer Review
Subjects:	Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.03458 [eess.IV]
	(or arXiv:2501.03458v1 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2501.03458

Submission history

From: Xiao Wang [view email]
[v1] Tue, 7 Jan 2025 01:19:48 UTC (7,486 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators