Common Functional Decompositions Can Mis-attribute Differences in Outcomes Between Populations

Quintero, Manuel; Stephenson, William T.; Shreekumar, Advik; Broderick, Tamara

Statistics > Methodology

arXiv:2504.16864 (stat)

[Submitted on 23 Apr 2025]

Title:Common Functional Decompositions Can Mis-attribute Differences in Outcomes Between Populations

Authors:Manuel Quintero, William T. Stephenson, Advik Shreekumar, Tamara Broderick

View PDF HTML (experimental)

Abstract:In science and social science, we often wish to explain why an outcome is different in two populations. For instance, if a jobs program benefits members of one city more than another, is that due to differences in program participants (particular covariates) or the local labor markets (outcomes given covariates)? The Kitagawa-Oaxaca-Blinder (KOB) decomposition is a standard tool in econometrics that explains the difference in the mean outcome across two populations. However, the KOB decomposition assumes a linear relationship between covariates and outcomes, while the true relationship may be meaningfully nonlinear. Modern machine learning boasts a variety of nonlinear functional decompositions for the relationship between outcomes and covariates in one population. It seems natural to extend the KOB decomposition using these functional decompositions. We observe that a successful extension should not attribute the differences to covariates -- or, respectively, to outcomes given covariates -- if those are the same in the two populations. Unfortunately, we demonstrate that, even in simple examples, two common decompositions -- functional ANOVA and Accumulated Local Effects -- can attribute differences to outcomes given covariates, even when they are identical in two populations. We provide a characterization of when functional ANOVA misattributes, as well as a general property that any discrete decomposition must satisfy to avoid misattribution. We show that if the decomposition is independent of its input distribution, it does not misattribute. We further conjecture that misattribution arises in any reasonable additive decomposition that depends on the distribution of the covariates.

Comments:	30 pages, appearing in 2nd Workshop on Navigating and Addressing Data Problems for Foundation Models (DATA-FM @ ICLR 2025)
Subjects:	Methodology (stat.ME); Machine Learning (cs.LG); Econometrics (econ.EM); Machine Learning (stat.ML)
Cite as:	arXiv:2504.16864 [stat.ME]
	(or arXiv:2504.16864v1 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.2504.16864

Submission history

From: Manuel Quintero Coronel [view email]
[v1] Wed, 23 Apr 2025 16:36:55 UTC (49 KB)

Statistics > Methodology

Title:Common Functional Decompositions Can Mis-attribute Differences in Outcomes Between Populations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Methodology

Title:Common Functional Decompositions Can Mis-attribute Differences in Outcomes Between Populations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators