An Isotropy Analysis in the Multilingual BERT Embedding Space

Rajaee, Sara; Pilehvar, Mohammad Taher

Computer Science > Computation and Language

arXiv:2110.04504v1 (cs)

[Submitted on 9 Oct 2021 (this version), latest version 16 Mar 2022 (v2)]

Title:An Isotropy Analysis in the Multilingual BERT Embedding Space

Authors:Sara Rajaee, Mohammad Taher Pilehvar

View PDF

Abstract:Several studies have explored various advantages of multilingual pre-trained models (e.g., multilingual BERT) in capturing shared linguistic knowledge. However, their limitations have not been paid enough attention. In this paper, we investigate the representation degeneration problem in multilingual contextual word representations (CWRs) of BERT and show that the embedding spaces of the selected languages suffer from anisotropy problem. Our experimental results demonstrate that, similarly to their monolingual counterparts, increasing the isotropy of multilingual embedding space can significantly improve its representation power and performance. Our analysis indicates that although the degenerated directions vary in different languages, they encode similar linguistic knowledge, suggesting a shared linguistic space among languages.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2110.04504 [cs.CL]
	(or arXiv:2110.04504v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2110.04504

Submission history

From: Sara Rajaee [view email]
[v1] Sat, 9 Oct 2021 08:29:49 UTC (5,960 KB)
[v2] Wed, 16 Mar 2022 18:26:23 UTC (1,315 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-10

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Mohammad Taher Pilehvar

export BibTeX citation

Computer Science > Computation and Language

Title:An Isotropy Analysis in the Multilingual BERT Embedding Space

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:An Isotropy Analysis in the Multilingual BERT Embedding Space

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators