Clustering in pure-attention hardmax transformers and its role in sentiment analysis

Alcalde, Albert; Fantuzzi, Giovanni; Zuazua, Enrique

Computer Science > Computation and Language

arXiv:2407.01602 (cs)

[Submitted on 26 Jun 2024]

Title:Clustering in pure-attention hardmax transformers and its role in sentiment analysis

Authors:Albert Alcalde, Giovanni Fantuzzi, Enrique Zuazua

View PDF HTML (experimental)

Abstract:Transformers are extremely successful machine learning models whose mathematical properties remain poorly understood. Here, we rigorously characterize the behavior of transformers with hardmax self-attention and normalization sublayers as the number of layers tends to infinity. By viewing such transformers as discrete-time dynamical systems describing the evolution of points in a Euclidean space, and thanks to a geometric interpretation of the self-attention mechanism based on hyperplane separation, we show that the transformer inputs asymptotically converge to a clustered equilibrium determined by special points called leaders. We then leverage this theoretical understanding to solve sentiment analysis problems from language processing using a fully interpretable transformer model, which effectively captures `context' by clustering meaningless words around leader words carrying the most meaning. Finally, we outline remaining challenges to bridge the gap between the mathematical analysis of transformers and their real-life implementation.

Comments:	23 pages, 10 figures, 1 table. Funded by the European Union (Horizon Europe MSCA project ModConFlex, grant number 101073558). Accompanying code available at: this https URL
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Dynamical Systems (math.DS); Machine Learning (stat.ML)
MSC classes:	68T07, 68T50
Cite as:	arXiv:2407.01602 [cs.CL]
	(or arXiv:2407.01602v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.01602

Submission history

From: Albert Alcalde [view email]
[v1] Wed, 26 Jun 2024 16:13:35 UTC (394 KB)

Computer Science > Computation and Language

Title:Clustering in pure-attention hardmax transformers and its role in sentiment analysis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Clustering in pure-attention hardmax transformers and its role in sentiment analysis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators