High-Throughput Phenotyping of Clinical Text Using Large Language Models

Hier, Daniel B.; Munzir, S. Ilyas; Stahlfeld, Anne; Obafemi-Ajayi, Tayo; Carrithers, Michael D.

Computer Science > Computation and Language

arXiv:2408.01214 (cs)

[Submitted on 2 Aug 2024]

Title:High-Throughput Phenotyping of Clinical Text Using Large Language Models

Authors:Daniel B. Hier, S. Ilyas Munzir, Anne Stahlfeld, Tayo Obafemi-Ajayi, Michael D. Carrithers

View PDF HTML (experimental)

Abstract:High-throughput phenotyping automates the mapping of patient signs to standardized ontology concepts and is essential for precision medicine. This study evaluates the automation of phenotyping of clinical summaries from the Online Mendelian Inheritance in Man (OMIM) database using large language models. Due to their rich phenotype data, these summaries can be surrogates for physician notes. We conduct a performance comparison of GPT-4 and GPT-3.5-Turbo. Our results indicate that GPT-4 surpasses GPT-3.5-Turbo in identifying, categorizing, and normalizing signs, achieving concordance with manual annotators comparable to inter-rater agreement. Despite some limitations in sign normalization, the extensive pre-training of GPT-4 results in high performance and generalizability across several phenotyping tasks while obviating the need for manually annotated training data. Large language models are expected to be the dominant method for automating high-throughput phenotyping of clinical text.

Comments:	Submitted to IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), Houston TX
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
ACM classes:	I.7; I.2
Cite as:	arXiv:2408.01214 [cs.CL]
	(or arXiv:2408.01214v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2408.01214

Submission history

From: Daniel Hier [view email]
[v1] Fri, 2 Aug 2024 12:00:00 UTC (6,402 KB)

Computer Science > Computation and Language

Title:High-Throughput Phenotyping of Clinical Text Using Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:High-Throughput Phenotyping of Clinical Text Using Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators