DrugGen: Advancing Drug Discovery with Large Language Models and Reinforcement Learning Feedback

Sheikholeslami, Mahsa; Mazrouei, Navid; Gheisari, Yousof; Fasihi, Afshin; Irajpour, Matin; Motahharynia, Ali

Abstract:Traditional drug design faces significant challenges due to inherent chemical and biological complexities, often resulting in high failure rates in clinical trials. Deep learning advancements, particularly generative models, offer potential solutions to these challenges. One promising algorithm is DrugGPT, a transformer-based model, that generates small molecules for input protein sequences. Although promising, it generates both chemically valid and invalid structures and does not incorporate the features of approved drugs, resulting in time-consuming and inefficient drug discovery. To address these issues, we introduce DrugGen, an enhanced model based on the DrugGPT structure. DrugGen is fine-tuned on approved drug-target interactions and optimized with proximal policy optimization. By giving reward feedback from protein-ligand binding affinity prediction using pre-trained transformers (PLAPT) and a customized invalid structure assessor, DrugGen significantly improves performance. Evaluation across multiple targets demonstrated that DrugGen achieves 100% valid structure generation compared to 95.5% with DrugGPT and produced molecules with higher predicted binding affinities (7.22 [6.30-8.07]) compared to DrugGPT (5.81 [4.97-6.63]) while maintaining diversity and novelty. Docking simulations further validate its ability to generate molecules targeting binding sites effectively. For example, in the case of fatty acid-binding protein 5 (FABP5), DrugGen generated molecules with superior docking scores (FABP5/11, -9.537 and FABP5/5, -8.399) compared to the reference molecule (Palmitic acid, -6.177). Beyond lead compound generation, DrugGen also shows potential for drug repositioning and creating novel pharmacophores for existing targets. By producing high-quality small molecules, DrugGen provides a high-performance medium for advancing pharmaceutical research and drug discovery.

Comments:	20 pages, 5 figures, 3 tables, and 7 supplementary files. To use the model, see this https URL
Subjects:	Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2411.14157 [q-bio.QM]
	(or arXiv:2411.14157v1 [q-bio.QM] for this version)
	https://doi.org/10.48550/arXiv.2411.14157

Quantitative Biology > Quantitative Methods

Title:DrugGen: Advancing Drug Discovery with Large Language Models and Reinforcement Learning Feedback

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators