On Separate Normalization in Self-supervised Transformers

Chen, Xiaohui; Wang, Yinkai; Du, Yuanqi; Hassoun, Soha; Liu, Li-Ping

Computer Science > Computation and Language

arXiv:2309.12931v1 (cs)

[Submitted on 22 Sep 2023 (this version), latest version 28 Nov 2023 (v2)]

Title:On Separate Normalization in Self-supervised Transformers

Authors:Xiaohui Chen, Yinkai Wang, Yuanqi Du, Soha Hassoun, Li-Ping Liu

View PDF

Abstract:Self-supervised training methods for transformers have demonstrated remarkable performance across various domains. Previous transformer-based models, such as masked autoencoders (MAE), typically utilize a single normalization layer for both the [CLS] symbol and the tokens. We propose in this paper a simple modification that employs separate normalization layers for the tokens and the [CLS] symbol to better capture their distinct characteristics and enhance downstream task performance. Our method aims to alleviate the potential negative effects of using the same normalization statistics for both token types, which may not be optimally aligned with their individual roles. We empirically show that by utilizing a separate normalization layer, the [CLS] embeddings can better encode the global contextual information and are distributed more uniformly in its anisotropic space. When replacing the conventional normalization layer with the two separate layers, we observe an average 2.7% performance improvement over the image, natural language, and graph domains.

Comments:	NIPS 2023
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2309.12931 [cs.CL]
	(or arXiv:2309.12931v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2309.12931

Submission history

From: Xiaohui Chen [view email]
[v1] Fri, 22 Sep 2023 15:30:53 UTC (4,824 KB)
[v2] Tue, 28 Nov 2023 19:06:49 UTC (4,815 KB)

Computer Science > Computation and Language

Title:On Separate Normalization in Self-supervised Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:On Separate Normalization in Self-supervised Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators