Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective

Liu, Zhexuan; Ma, Rong; Zhong, Yiqiao

Statistics > Methodology

arXiv:2410.16608 (stat)

[Submitted on 22 Oct 2024 (v1), last revised 1 Apr 2025 (this version, v2)]

Title:Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective

Authors:Zhexuan Liu, Rong Ma, Yiqiao Zhong

View PDF HTML (experimental)

Abstract:Visualizing high-dimensional data is essential for understanding biomedical data and deep learning models. Neighbor embedding methods, such as t-SNE and UMAP, are widely used but can introduce misleading visual artifacts. We find that the manifold learning interpretations from many prior works are inaccurate and that the misuse stems from a lack of data-independent notions of embedding maps, which project high-dimensional data into a lower-dimensional space. Leveraging the leave-one-out principle, we introduce LOO-map, a framework that extends embedding maps beyond discrete points to the entire input space. We identify two forms of map discontinuity that distort visualizations: one exaggerates cluster separation and the other creates spurious local structures. As a remedy, we develop two types of point-wise diagnostic scores to detect unreliable embedding points and improve hyperparameter selection, which are validated on datasets from computer vision and single-cell omics.

Comments:	49 pages, 20 figures
Subjects:	Methodology (stat.ME); Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
MSC classes:	62-08
Cite as:	arXiv:2410.16608 [stat.ME]
	(or arXiv:2410.16608v2 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.2410.16608

Submission history

From: Zhexuan Liu [view email]
[v1] Tue, 22 Oct 2024 01:40:43 UTC (11,555 KB)
[v2] Tue, 1 Apr 2025 02:20:44 UTC (16,135 KB)

Statistics > Methodology

Title:Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Methodology

Title:Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators