Predicting Anti-microbial Resistance using Large Language Models

Yoo, Hyunwoo; Sokhansanj, Bahrad; Brown, James R.; Rosen, Gail

Computer Science > Computation and Language

arXiv:2401.00642 (cs)

COVID-19 e-print

Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field.

[Submitted on 1 Jan 2024]

Title:Predicting Anti-microbial Resistance using Large Language Models

Authors:Hyunwoo Yoo, Bahrad Sokhansanj, James R. Brown, Gail Rosen

View PDF HTML (experimental)

Abstract:During times of increasing antibiotic resistance and the spread of infectious diseases like COVID-19, it is important to classify genes related to antibiotic resistance. As natural language processing has advanced with transformer-based language models, many language models that learn characteristics of nucleotide sequences have also emerged. These models show good performance in classifying various features of nucleotide sequences. When classifying nucleotide sequences, not only the sequence itself, but also various background knowledge is utilized. In this study, we use not only a nucleotide sequence-based language model but also a text language model based on PubMed articles to reflect more biological background knowledge in the model. We propose a method to fine-tune the nucleotide sequence language model and the text language model based on various databases of antibiotic resistance genes. We also propose an LLM-based augmentation technique to supplement the data and an ensemble method to effectively combine the two models. We also propose a benchmark for evaluating the model. Our method achieved better performance than the nucleotide sequence language model in the drug resistance class prediction.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2401.00642 [cs.CL]
	(or arXiv:2401.00642v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2401.00642

Submission history

From: Hyunwoo Yoo [view email]
[v1] Mon, 1 Jan 2024 03:04:14 UTC (8,006 KB)

Computer Science > Computation and Language

Title:Predicting Anti-microbial Resistance using Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Predicting Anti-microbial Resistance using Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators