Automatic Report Generation for Histopathology images using pre-trained Vision Transformers and BERT

Sengupta, Saurav; Brown, Donald E.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.01435 (cs)

[Submitted on 3 Dec 2023 (v1), last revised 15 Mar 2024 (this version, v2)]

Title:Automatic Report Generation for Histopathology images using pre-trained Vision Transformers and BERT

Authors:Saurav Sengupta, Donald E. Brown

View PDF HTML (experimental)

Abstract:Deep learning for histopathology has been successfully used for disease classification, image segmentation and more. However, combining image and text modalities using current state-of-the-art (SOTA) methods has been a challenge due to the high resolution of histopathology images. Automatic report generation for histopathology images is one such challenge. In this work, we show that using an existing pre-trained Vision Transformer (ViT) to encode 4096x4096 sized patches of the Whole Slide Image (WSI) and a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model for language modeling-based decoder for report generation, we can build a performant and portable report generation mechanism that takes into account the whole high resolution image. Our method allows us to not only generate and evaluate captions that describe the image, but also helps us classify the image into tissue types and the gender of the patient as well. Our best performing model achieves a 89.52% accuracy in Tissue Type classification with a BLEU-4 score of 0.12 in our caption generation task.

Comments:	Accepted at IEEE ISBI 2024. arXiv admin note: substantial text overlap with arXiv:2311.06176
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.01435 [cs.CV]
	(or arXiv:2312.01435v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.01435

Submission history

From: Saurav Sengupta [view email]
[v1] Sun, 3 Dec 2023 15:56:09 UTC (12,937 KB)
[v2] Fri, 15 Mar 2024 12:39:24 UTC (12,938 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Automatic Report Generation for Histopathology images using pre-trained Vision Transformers and BERT

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Automatic Report Generation for Histopathology images using pre-trained Vision Transformers and BERT

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators