Beam Enumeration: Probabilistic Explainability For Sample Efficient Self-conditioned Molecular Design

Guo, Jeff; Schwaller, Philippe

Quantitative Biology > Biomolecules

arXiv:2309.13957 (q-bio)

[Submitted on 25 Sep 2023 (v1), last revised 3 Mar 2024 (this version, v2)]

Title:Beam Enumeration: Probabilistic Explainability For Sample Efficient Self-conditioned Molecular Design

Authors:Jeff Guo, Philippe Schwaller

View PDF HTML (experimental)

Abstract:Generative molecular design has moved from proof-of-concept to real-world applicability, as marked by the surge in very recent papers reporting experimental validation. Key challenges in explainability and sample efficiency present opportunities to enhance generative design to directly optimize expensive high-fidelity oracles and provide actionable insights to domain experts. Here, we propose Beam Enumeration to exhaustively enumerate the most probable sub-sequences from language-based molecular generative models and show that molecular substructures can be extracted. When coupled with reinforcement learning, extracted substructures become meaningful, providing a source of explainability and improving sample efficiency through self-conditioned generation. Beam Enumeration is generally applicable to any language-based molecular generative model and notably further improves the performance of the recently reported Augmented Memory algorithm, which achieved the new state-of-the-art on the Practical Molecular Optimization benchmark for sample efficiency. The combined algorithm generates more high reward molecules and faster, given a fixed oracle budget. Beam Enumeration shows that improvements to explainability and sample efficiency for molecular design can be made synergistic.

Subjects:	Biomolecules (q-bio.BM); Machine Learning (cs.LG)
Cite as:	arXiv:2309.13957 [q-bio.BM]
	(or arXiv:2309.13957v2 [q-bio.BM] for this version)
	https://doi.org/10.48550/arXiv.2309.13957

Submission history

From: Jeff Guo [view email]
[v1] Mon, 25 Sep 2023 08:43:13 UTC (18,758 KB)
[v2] Sun, 3 Mar 2024 16:23:00 UTC (26,286 KB)

Quantitative Biology > Biomolecules

Title:Beam Enumeration: Probabilistic Explainability For Sample Efficient Self-conditioned Molecular Design

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Biomolecules

Title:Beam Enumeration: Probabilistic Explainability For Sample Efficient Self-conditioned Molecular Design

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators