Improving Small Molecule Generation using Mutual Information Machine

Reidenbach, Danny; Livne, Micha; Ilango, Rajesh K.; Gill, Michelle; Israeli, Johnny

Computer Science > Machine Learning

arXiv:2208.09016 (cs)

[Submitted on 18 Aug 2022 (v1), last revised 29 Mar 2023 (this version, v2)]

Title:Improving Small Molecule Generation using Mutual Information Machine

Authors:Danny Reidenbach, Micha Livne, Rajesh K. Ilango, Michelle Gill, Johnny Israeli

View PDF

Abstract:We address the task of controlled generation of small molecules, which entails finding novel molecules with desired properties under certain constraints (e.g., similarity to a reference molecule). Here we introduce MolMIM, a probabilistic auto-encoder for small molecule drug discovery that learns an informative and clustered latent space. MolMIM is trained with Mutual Information Machine (MIM) learning, and provides a fixed length representation of variable length SMILES strings. Since encoder-decoder models can learn representations with ``holes'' of invalid samples, here we propose a novel extension to the training procedure which promotes a dense latent space, and allows the model to sample valid molecules from random perturbations of latent codes. We provide a thorough comparison of MolMIM to several variable-size and fixed-size encoder-decoder models, demonstrating MolMIM's superior generation as measured in terms of validity, uniqueness, and novelty. We then utilize CMA-ES, a naive black-box and gradient free search algorithm, over MolMIM's latent space for the task of property guided molecule optimization. We achieve state-of-the-art results in several constrained single property optimization tasks as well as in the challenging task of multi-objective optimization, improving over previous success rate SOTA by more than 5\% . We attribute the strong results to MolMIM's latent representation which clusters similar molecules in the latent space, whereas CMA-ES is often used as a baseline optimization method. We also demonstrate MolMIM to be favourable in a compute limited regime, making it an attractive model for such cases.

Comments:	Published at the MLDD workshop, ICLR 2023. version 2. 8 pages, 4 figures, 4 tables
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM); Quantitative Methods (q-bio.QM)
MSC classes:	92-08
ACM classes:	J.3; I.2.7
Cite as:	arXiv:2208.09016 [cs.LG]
	(or arXiv:2208.09016v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2208.09016

Submission history

From: Micha Livne [view email]
[v1] Thu, 18 Aug 2022 18:32:48 UTC (3,883 KB)
[v2] Wed, 29 Mar 2023 21:20:00 UTC (1,763 KB)

Computer Science > Machine Learning

Title:Improving Small Molecule Generation using Mutual Information Machine

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Improving Small Molecule Generation using Mutual Information Machine

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators