Why risk matters for protein binder design

Cotet, Tudor-Stefan; Krawczuk, Igor

Computer Science > Machine Learning

arXiv:2504.00146 (cs)

[Submitted on 31 Mar 2025 (v1), last revised 2 Apr 2025 (this version, v2)]

Title:Why risk matters for protein binder design

Authors:Tudor-Stefan Cotet, Igor Krawczuk

View PDF HTML (experimental)

Abstract:Bayesian optimization (BO) has recently become more prevalent in protein engineering applications and hence has become a fruitful target of benchmarks. However, current BO comparisons often overlook real-world considerations like risk and cost constraints. In this work, we compare 72 model combinations of encodings, surrogate models, and acquisition functions on 11 protein binder fitness landscapes, specifically from this perspective. Drawing from the portfolio optimization literature, we adopt metrics to quantify the cold-start performance relative to a random baseline, to assess the risk of an optimization campaign, and to calculate the overall budget required to reach a fitness threshold. Our results suggest the existence of Pareto-optimal models on the risk-performance axis, the shift of this preference depending on the landscape explored, and the robust correlation between landscape properties such as epistasis with the average and worst-case model performance. They also highlight that rigorous model selection requires substantial computational and statistical efforts.

Comments:	10 pages, 5 figures, 1 table, to be presented at ICLR 2025 GEM Workshop this https URL
Subjects:	Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Cite as:	arXiv:2504.00146 [cs.LG]
	(or arXiv:2504.00146v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2504.00146

Submission history

From: Igor Krawczuk [view email]
[v1] Mon, 31 Mar 2025 18:54:38 UTC (17,020 KB)
[v2] Wed, 2 Apr 2025 11:43:24 UTC (15,447 KB)

Computer Science > Machine Learning

Title:Why risk matters for protein binder design

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Why risk matters for protein binder design

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators