Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later

Ye, Han-Jia; Yin, Huai-Hong; Zhan, De-Chuan

Abstract:The growing success of deep learning in various domains has prompted investigations into its application to tabular data, where deep models have shown promising results compared to traditional tree-based methods. In this paper, we revisit Neighborhood Component Analysis (NCA), a classic tabular prediction method introduced in 2004, designed to learn a linear projection that captures semantic similarities between instances. We find that minor modifications, such as adjustments to the learning objectives and the integration of deep learning architectures, significantly enhance NCA's performance, enabling it to surpass most modern deep tabular models. Additionally, we introduce a stochastic neighbor sampling strategy that improves both the efficiency and predictive accuracy of our proposed ModernNCA -- sampling only a subset of neighbors during training, while utilizing the entire neighborhood during inference. Extensive experiments demonstrate that our ModernNCA achieves state-of-the-art results in both classification and regression tasks across various tabular datasets, outperforming both tree-based and other deep tabular models, while also reducing training time and model size.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2407.03257 [cs.LG]
	(or arXiv:2407.03257v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2407.03257

Computer Science > Machine Learning

Title:Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators