Small Molecule Optimization with Large Language Models

Guevorguian, Philipp; Bedrosian, Menua; Fahradyan, Tigran; Chilingaryan, Gayane; Khachatrian, Hrant; Aghajanyan, Armen

Computer Science > Machine Learning

arXiv:2407.18897 (cs)

[Submitted on 26 Jul 2024]

Title:Small Molecule Optimization with Large Language Models

Authors:Philipp Guevorguian, Menua Bedrosian, Tigran Fahradyan, Gayane Chilingaryan, Hrant Khachatrian, Armen Aghajanyan

View PDF HTML (experimental)

Abstract:Recent advancements in large language models have opened new possibilities for generative molecular drug design. We present Chemlactica and Chemma, two language models fine-tuned on a novel corpus of 110M molecules with computed properties, totaling 40B tokens. These models demonstrate strong performance in generating molecules with specified properties and predicting new molecular characteristics from limited samples. We introduce a novel optimization algorithm that leverages our language models to optimize molecules for arbitrary properties given limited access to a black box oracle. Our approach combines ideas from genetic algorithms, rejection sampling, and prompt optimization. It achieves state-of-the-art performance on multiple molecular optimization benchmarks, including an 8% improvement on Practical Molecular Optimization compared to previous methods. We publicly release the training corpus, the language models and the optimization algorithm.

Subjects:	Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Quantitative Methods (q-bio.QM)
Cite as:	arXiv:2407.18897 [cs.LG]
	(or arXiv:2407.18897v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2407.18897

Submission history

From: Hrant Khachatrian [view email]
[v1] Fri, 26 Jul 2024 17:51:33 UTC (6,455 KB)

Computer Science > Machine Learning

Title:Small Molecule Optimization with Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Small Molecule Optimization with Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators