Automatic Outlier Rectification via Optimal Transport

Blanchet, Jose; Li, Jiajin; Pelger, Markus; Zanotti, Greg

Statistics > Machine Learning

arXiv:2403.14067v1 (stat)

[Submitted on 21 Mar 2024 (this version), latest version 11 Jul 2024 (v2)]

Title:Automatic Outlier Rectification via Optimal Transport

Authors:Jose Blanchet, Jiajin Li, Markus Pelger, Greg Zanotti

View PDF HTML (experimental)

Abstract:In this paper, we propose a novel conceptual framework to detect outliers using optimal transport with a concave cost function. Conventional outlier detection approaches typically use a two-stage procedure: first, outliers are detected and removed, and then estimation is performed on the cleaned data. However, this approach does not inform outlier removal with the estimation task, leaving room for improvement. To address this limitation, we propose an automatic outlier rectification mechanism that integrates rectification and estimation within a joint optimization framework. We take the first step to utilize an optimal transport distance with a concave cost function to construct a rectification set in the space of probability distributions. Then, we select the best distribution within the rectification set to perform the estimation task. Notably, the concave cost function we introduced in this paper is the key to making our estimator effectively identify the outlier during the optimization process. We discuss the fundamental differences between our estimator and optimal transport-based distributionally robust optimization estimator. finally, we demonstrate the effectiveness and superiority of our approach over conventional approaches in extensive simulation and empirical analyses for mean estimation, least absolute regression, and the fitting of option implied volatility surfaces.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Methodology (stat.ME)
Cite as:	arXiv:2403.14067 [stat.ML]
	(or arXiv:2403.14067v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2403.14067

Submission history

From: Jiajin Li [view email]
[v1] Thu, 21 Mar 2024 01:30:24 UTC (3,977 KB)
[v2] Thu, 11 Jul 2024 05:22:42 UTC (3,980 KB)

Statistics > Machine Learning

Title:Automatic Outlier Rectification via Optimal Transport

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Automatic Outlier Rectification via Optimal Transport

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators