$\mathtt{GeLLM^3O}$: Generalizing Large Language Models for Multi-property Molecule Optimization

Dey, Vishal; Hu, Xiao; Ning, Xia

Computer Science > Machine Learning

arXiv:2502.13398 (cs)

[Submitted on 19 Feb 2025]

Title:$\mathtt{GeLLM^3O}$: Generalizing Large Language Models for Multi-property Molecule Optimization

Authors:Vishal Dey, Xiao Hu, Xia Ning

View PDF HTML (experimental)

Abstract:Despite recent advancements, most computational methods for molecule optimization are constrained to single- or double-property optimization tasks and suffer from poor scalability and generalizability to novel optimization tasks. Meanwhile, Large Language Models (LLMs) demonstrate remarkable out-of-domain generalizability to novel tasks. To demonstrate LLMs' potential for molecule optimization, we introduce $\mathtt{MoMUInstruct}$, the first high-quality instruction-tuning dataset specifically focused on complex multi-property molecule optimization tasks. Leveraging $\mathtt{MoMUInstruct}$, we develop $\mathtt{GeLLM^3O}$s, a series of instruction-tuned LLMs for molecule optimization. Extensive evaluations across 5 in-domain and 5 out-of-domain tasks demonstrate that $\mathtt{GeLLM^3O}$s consistently outperform state-of-the-art baselines. $\mathtt{GeLLM^3O}$s also exhibit outstanding zero-shot generalization to unseen tasks, significantly outperforming powerful closed-source LLMs. Such strong generalizability demonstrates the tremendous potential of $\mathtt{GeLLM^3O}$s as foundational models for molecule optimization, thereby tackling novel optimization tasks without resource-intensive retraining. $\mathtt{MoMUInstruct}$, models, and code are accessible through this https URL.

Comments:	Vishal Dey and Xiao Hu contributed equally to this paper
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Chemical Physics (physics.chem-ph); Quantitative Methods (q-bio.QM)
Cite as:	arXiv:2502.13398 [cs.LG]
	(or arXiv:2502.13398v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.13398

Submission history

From: Vishal Dey [view email]
[v1] Wed, 19 Feb 2025 03:14:11 UTC (3,050 KB)

Computer Science > Machine Learning

Title:$\mathtt{GeLLM^3O}$: Generalizing Large Language Models for Multi-property Molecule Optimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:$\mathtt{GeLLM^3O}$: Generalizing Large Language Models for Multi-property Molecule Optimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators