Analyzing the Structure of Attention in a Transformer Language Model

Vig, Jesse; Belinkov, Yonatan

Computer Science > Computation and Language

arXiv:1906.04284 (cs)

[Submitted on 7 Jun 2019 (v1), last revised 18 Jun 2019 (this version, v2)]

Title:Analyzing the Structure of Attention in a Transformer Language Model

Authors:Jesse Vig, Yonatan Belinkov

View PDF

Abstract:The Transformer is a fully attention-based alternative to recurrent networks that has achieved state-of-the-art results across a range of NLP tasks. In this paper, we analyze the structure of attention in a Transformer language model, the GPT-2 small pretrained model. We visualize attention for individual instances and analyze the interaction between attention and syntax over a large corpus. We find that attention targets different parts of speech at different layer depths within the model, and that attention aligns with dependency relations most strongly in the middle layers. We also find that the deepest layers of the model capture the most distant relationships. Finally, we extract exemplar sentences that reveal highly specific patterns targeted by particular attention heads.

Comments:	To appear in ACL BlackboxNLP workshop
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1906.04284 [cs.CL]
	(or arXiv:1906.04284v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1906.04284

Submission history

From: Jesse Vig [view email]
[v1] Fri, 7 Jun 2019 13:58:49 UTC (1,789 KB)
[v2] Tue, 18 Jun 2019 19:42:31 UTC (1,789 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2019-06

Change to browse by:

cs
cs.LG
stat
stat.ML

References & Citations

2 blog links

(what is this?)

DBLP - CS Bibliography

listing | bibtex

Jesse Vig
Yonatan Belinkov

export BibTeX citation

Computer Science > Computation and Language

Title:Analyzing the Structure of Attention in a Transformer Language Model

Submission history

Access Paper:

References & Citations

2 blog links

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Analyzing the Structure of Attention in a Transformer Language Model

Submission history

Access Paper:

References & Citations

2 blog links

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators