Answering real-world clinical questions using large language model based systems

Low, Yen Sia; Jackson, Michael L.; Hyde, Rebecca J.; Brown, Robert E.; Sanghavi, Neil M.; Baldwin, Julian D.; Pike, C. William; Muralidharan, Jananee; Hui, Gavin; Alexander, Natasha; Hassan, Hadeel; Nene, Rahul V.; Pike, Morgan; Pokrzywa, Courtney J.; Vedak, Shivam; Yan, Adam Paul; Yao, Dong-han; Zipursky, Amy R.; Dinh, Christina; Ballentine, Philip; Derieg, Dan C.; Polony, Vladimir; Chawdry, Rehan N.; Davies, Jordan; Hyde, Brigham B.; Shah, Nigam H.; Gombar, Saurabh

Computer Science > Computation and Language

arXiv:2407.00541 (cs)

[Submitted on 29 Jun 2024]

Title:Answering real-world clinical questions using large language model based systems

Authors:Yen Sia Low (1), Michael L. Jackson (1), Rebecca J. Hyde (1), Robert E. Brown (1), Neil M. Sanghavi (1), Julian D. Baldwin (1), C. William Pike (1), Jananee Muralidharan (1), Gavin Hui (1 and 2), Natasha Alexander (3), Hadeel Hassan (3), Rahul V. Nene (4), Morgan Pike (5), Courtney J. Pokrzywa (6), Shivam Vedak (7), Adam Paul Yan (3), Dong-han Yao (7), Amy R. Zipursky (3), Christina Dinh (1), Philip Ballentine (1), Dan C. Derieg (1), Vladimir Polony (1), Rehan N. Chawdry (1), Jordan Davies (1), Brigham B. Hyde (1), Nigam H. Shah (1 and 7), Saurabh Gombar (1 and 8) ((1) Atropos Health, New York NY, USA, (2) Department of Medicine, University of California, Los Angeles CA, USA, (3) Department of Pediatrics, The Hospital for Sick Children, Toronto ON, Canada, (4) Department of Emergency Medicine, University of California, San Diego CA, USA, (5) Department of Emergency Medicine, University of Michigan, Ann Arbor MI, USA, (6) Department of Surgery, Columbia University, New York NY, USA, (7) Center for Biomedical Informatics Research, Stanford University, Stanford CA, USA (8) Department of Pathology, Stanford University, Stanford CA, USA)

View PDF

Abstract:Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-based systems in answering 50 clinical questions and had nine independent physicians review the responses for relevance, reliability, and actionability. As it stands, general-purpose LLMs (ChatGPT-4, Claude 3 Opus, Gemini Pro 1.5) rarely produced answers that were deemed relevant and evidence-based (2% - 10%). In contrast, retrieval augmented generation (RAG)-based and agentic LLM systems produced relevant and evidence-based answers for 24% (OpenEvidence) to 58% (ChatRWD) of questions. Only the agentic ChatRWD was able to answer novel questions compared to other LLMs (65% vs. 0-9%). These results suggest that while general-purpose LLMs should not be used as-is, a purpose-built system for evidence summarization based on RAG and one for generating novel evidence working synergistically would improve availability of pertinent evidence for patient care.

Comments:	28 pages (2 figures, 3 tables) inclusive of 8 pages of supplemental materials (4 supplemental figures and 4 supplemental tables)
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Cite as:	arXiv:2407.00541 [cs.CL]
	(or arXiv:2407.00541v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.00541

Submission history

From: Yen Low [view email]
[v1] Sat, 29 Jun 2024 22:39:20 UTC (792 KB)

Computer Science > Computation and Language

Title:Answering real-world clinical questions using large language model based systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Answering real-world clinical questions using large language model based systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators