Variable Selection in Maximum Mean Discrepancy for Interpretable Distribution Comparison

Mitsuzawa, Kensuke; Kanagawa, Motonobu; Bortoli, Stefano; Grossi, Margherita; Papotti, Paolo

Statistics > Machine Learning

arXiv:2311.01537 (stat)

[Submitted on 2 Nov 2023]

Title:Variable Selection in Maximum Mean Discrepancy for Interpretable Distribution Comparison

Authors:Kensuke Mitsuzawa, Motonobu Kanagawa, Stefano Bortoli, Margherita Grossi, Paolo Papotti

View PDF

Abstract:Two-sample testing decides whether two datasets are generated from the same distribution. This paper studies variable selection for two-sample testing, the task being to identify the variables (or dimensions) responsible for the discrepancies between the two distributions. This task is relevant to many problems of pattern analysis and machine learning, such as dataset shift adaptation, causal inference and model validation. Our approach is based on a two-sample test based on the Maximum Mean Discrepancy (MMD). We optimise the Automatic Relevance Detection (ARD) weights defined for individual variables to maximise the power of the MMD-based test. For this optimisation, we introduce sparse regularisation and propose two methods for dealing with the issue of selecting an appropriate regularisation parameter. One method determines the regularisation parameter in a data-driven way, and the other aggregates the results of different regularisation parameters. We confirm the validity of the proposed methods by systematic comparisons with baseline methods, and demonstrate their usefulness in exploratory analysis of high-dimensional traffic simulation data. Preliminary theoretical analyses are also provided, including a rigorous definition of variable selection for two-sample testing.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2311.01537 [stat.ML]
	(or arXiv:2311.01537v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2311.01537

Submission history

From: Kensuke Mitsuzawa [view email]
[v1] Thu, 2 Nov 2023 18:38:39 UTC (3,594 KB)

Statistics > Machine Learning

Title:Variable Selection in Maximum Mean Discrepancy for Interpretable Distribution Comparison

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Variable Selection in Maximum Mean Discrepancy for Interpretable Distribution Comparison

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators