Attribution Methods Reveal Flaws in Fingerprint-Based Virtual Screening

Sundar, Vikram; Colwell, Lucy

Quantitative Biology > Biomolecules

arXiv:2007.01436 (q-bio)

[Submitted on 2 Jul 2020 (v1), last revised 8 Jul 2020 (this version, v2)]

Title:Attribution Methods Reveal Flaws in Fingerprint-Based Virtual Screening

Authors:Vikram Sundar (1), Lucy Colwell (1 and 2) ((1) Google Research, (2) Department of Chemistry, University of Cambridge)

View PDF

Abstract:Fingerprint-based models for protein-ligand binding have demonstrated outstanding success on benchmark datasets; however, these models may not learn the correct binding rules. To assess this concern, we use in silico datasets with known binding rules to develop a general framework for evaluating model attribution. This framework identifies fragments that a model considers necessary to achieve a particular score, sidestepping the need for a model to be differentiable. Our results confirm that high-performing models may not learn the correct binding rule, and suggest concrete steps that can remedy this situation. We show that adding fragment-matched inactive molecules (decoys) to the data reduces attribution false negatives, while attribution false positives largely arise from the background correlation structure of molecular data. Normalizing for these background correlations helps to reveal the true binding logic. Our work highlights the danger of trusting attributions from high-performing models and suggests that a closer examination of fingerprint correlation structure and better decoy selection may help reduce misattributions.

Comments:	4 pages, 5 figures. In proceedings for the 2020 ICML workshop on Machine Learning Interpretability for Scientific Discovery
Subjects:	Biomolecules (q-bio.BM); Quantitative Methods (q-bio.QM)
Cite as:	arXiv:2007.01436 [q-bio.BM]
	(or arXiv:2007.01436v2 [q-bio.BM] for this version)
	https://doi.org/10.48550/arXiv.2007.01436

Submission history

From: Vikram Sundar [view email]
[v1] Thu, 2 Jul 2020 23:23:47 UTC (311 KB)
[v2] Wed, 8 Jul 2020 22:34:00 UTC (311 KB)

Quantitative Biology > Biomolecules

Title:Attribution Methods Reveal Flaws in Fingerprint-Based Virtual Screening

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Biomolecules

Title:Attribution Methods Reveal Flaws in Fingerprint-Based Virtual Screening

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators