BERTology Meets Biology: Interpreting Attention in Protein Language Models

Vig, Jesse; Madani, Ali; Varshney, Lav R.; Xiong, Caiming; Socher, Richard; Rajani, Nazneen Fatema

Computer Science > Computation and Language

arXiv:2006.15222 (cs)

[Submitted on 26 Jun 2020 (v1), last revised 28 Mar 2021 (this version, v3)]

Title:BERTology Meets Biology: Interpreting Attention in Protein Language Models

Authors:Jesse Vig, Ali Madani, Lav R. Varshney, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani

View PDF

Abstract:Transformer architectures have proven to learn useful representations for protein classification and generation tasks. However, these representations present challenges in interpretability. In this work, we demonstrate a set of methods for analyzing protein Transformer models through the lens of attention. We show that attention: (1) captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure, (2) targets binding sites, a key functional component of proteins, and (3) focuses on progressively more complex biophysical properties with increasing layer depth. We find this behavior to be consistent across three Transformer architectures (BERT, ALBERT, XLNet) and two distinct protein datasets. We also present a three-dimensional visualization of the interaction between attention and protein structure. Code for visualization and analysis is available at this https URL.

Comments:	To appear in ICLR 2021
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Biomolecules (q-bio.BM)
ACM classes:	I.2
Cite as:	arXiv:2006.15222 [cs.CL]
	(or arXiv:2006.15222v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2006.15222

Submission history

From: Jesse Vig [view email]
[v1] Fri, 26 Jun 2020 21:50:17 UTC (3,040 KB)
[v2] Mon, 13 Jul 2020 23:44:32 UTC (3,041 KB)
[v3] Sun, 28 Mar 2021 21:56:26 UTC (1,405 KB)

Computer Science > Computation and Language

Title:BERTology Meets Biology: Interpreting Attention in Protein Language Models

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:BERTology Meets Biology: Interpreting Attention in Protein Language Models

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators