Generalization within in silico screening

Loukas, Andreas; Kessel, Pan; Gligorijevic, Vladimir; Bonneau, Richard

Statistics > Machine Learning

arXiv:2307.09379 (stat)

[Submitted on 18 Jul 2023 (v1), last revised 23 Jul 2024 (this version, v2)]

Title:Generalization within in silico screening

Authors:Andreas Loukas, Pan Kessel, Vladimir Gligorijevic, Richard Bonneau

View PDF HTML (experimental)

Abstract:In silico screening uses predictive models to select a batch of compounds with favorable properties from a library for experimental validation. Unlike conventional learning paradigms, success in this context is measured by the performance of the predictive model on the selected subset of compounds rather than the entire set of predictions. By extending learning theory, we show that the selectivity of the selection policy can significantly impact generalization, with a higher risk of errors occurring when exclusively selecting predicted positives and when targeting rare properties. Our analysis suggests a way to mitigate these challenges. We show that generalization can be markedly enhanced when considering a model's ability to predict the fraction of desired outcomes in a batch. This is promising, as the primary aim of screening is not necessarily to pinpoint the label of each compound individually, but rather to assemble a batch enriched for desirable compounds. Our theoretical insights are empirically validated across diverse tasks, architectures, and screening scenarios, underscoring their applicability.

Comments:	9 pages, 3 figures
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2307.09379 [stat.ML]
	(or arXiv:2307.09379v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2307.09379

Submission history

From: Pan Kessel [view email]
[v1] Tue, 18 Jul 2023 16:01:01 UTC (875 KB)
[v2] Tue, 23 Jul 2024 16:37:22 UTC (1,045 KB)

Statistics > Machine Learning

Title:Generalization within in silico screening

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Generalization within in silico screening

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators