MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction

Liu, Yuyan; Ding, Sirui; Zhou, Sheng; Fan, Wenqi; Tan, Qiaoyu

Quantitative Biology > Quantitative Methods

arXiv:2406.12950 (q-bio)

[Submitted on 18 Jun 2024 (v1), last revised 18 Oct 2024 (this version, v2)]

Title:MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction

Authors:Yuyan Liu, Sirui Ding, Sheng Zhou, Wenqi Fan, Qiaoyu Tan

View PDF HTML (experimental)

Abstract:Molecular property prediction (MPP) is a fundamental and crucial task in drug discovery. However, prior methods are limited by the requirement for a large number of labeled molecules and their restricted ability to generalize for unseen and new tasks, both of which are essential for real-world applications. To address these challenges, we present MolecularGPT for few-shot MPP. From a perspective on instruction tuning, we fine-tune large language models (LLMs) based on curated molecular instructions spanning over 1000 property prediction tasks. This enables building a versatile and specialized LLM that can be adapted to novel MPP tasks without any fine-tuning through zero- and few-shot in-context learning (ICL). MolecularGPT exhibits competitive in-context reasoning capabilities across 10 downstream evaluation datasets, setting new benchmarks for few-shot molecular prediction tasks. More importantly, with just two-shot examples, MolecularGPT can outperform standard supervised graph neural network methods on 4 out of 7 datasets. It also excels state-of-the-art LLM baselines by up to 15.7% increase on classification accuracy and decrease of 17.9 on regression metrics (e.g., RMSE) under zero-shot. This study demonstrates the potential of LLMs as effective few-shot molecular property predictors. The code is available at this https URL.

Subjects:	Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2406.12950 [q-bio.QM]
	(or arXiv:2406.12950v2 [q-bio.QM] for this version)
	https://doi.org/10.48550/arXiv.2406.12950

Submission history

From: Yuyan Liu [view email]
[v1] Tue, 18 Jun 2024 12:54:47 UTC (1,860 KB)
[v2] Fri, 18 Oct 2024 12:19:41 UTC (1,872 KB)

Quantitative Biology > Quantitative Methods

Title:MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Quantitative Methods

Title:MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators