Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models

Ekeberg, Magnus; Lövkvist, Cecilia; Lan, Yueheng; Weigt, Martin; Aurell, Erik

doi:10.1103/PhysRevE.87.012707

Quantitative Biology > Quantitative Methods

arXiv:1211.1281 (q-bio)

[Submitted on 6 Nov 2012 (v1), last revised 12 Jan 2013 (this version, v2)]

Title:Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models

Authors:Magnus Ekeberg, Cecilia Lövkvist, Yueheng Lan, Martin Weigt, Erik Aurell

View PDF

Abstract:Spatially proximate amino acids in a protein tend to coevolve. A protein's three-dimensional (3D) structure hence leaves an echo of correlations in the evolutionary record. Reverse engineering 3D structures from such correlations is an open problem in structural biology, pursued with increasing vigor as more and more protein sequences continue to fill the data banks. Within this task lies a statistical inference problem, rooted in the following: correlation between two sites in a protein sequence can arise from firsthand interaction but can also be network-propagated via intermediate sites; observed correlation is not enough to guarantee proximity. To separate direct from indirect interactions is an instance of the general problem of inverse statistical mechanics, where the task is to learn model parameters (fields, couplings) from observables (magnetizations, correlations, samples) in large systems. In the context of protein sequences, the approach has been referred to as direct-coupling analysis. Here we show that the pseudolikelihood method, applied to 21-state Potts models describing the statistical properties of families of evolutionarily related proteins, significantly outperforms existing approaches to the direct-coupling analysis, the latter being based on standard mean-field techniques. This improved performance also relies on a modified score for the coupling strength. The results are verified using known crystal structures of specific sequence instances of various protein families. Code implementing the new method can be found at this http URL.

Comments:	19 pages, 16 figures, published version
Subjects:	Quantitative Methods (q-bio.QM); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Data Analysis, Statistics and Probability (physics.data-an)
Cite as:	arXiv:1211.1281 [q-bio.QM]
	(or arXiv:1211.1281v2 [q-bio.QM] for this version)
	https://doi.org/10.48550/arXiv.1211.1281
Journal reference:	M. Ekeberg, C. Lövkvist, Y. Lan, M. Weigt, E. Aurell, Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models, Phys. Rev. E 87, 012707 (2013)
Related DOI:	https://doi.org/10.1103/PhysRevE.87.012707

Submission history

From: Magnus Ekeberg [view email]
[v1] Tue, 6 Nov 2012 16:02:27 UTC (10,662 KB)
[v2] Sat, 12 Jan 2013 11:17:29 UTC (1,049 KB)

Quantitative Biology > Quantitative Methods

Title:Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Quantitative Methods

Title:Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators