Model-agnostic Feature Importance and Effects with Dependent Features -- A Conditional Subgroup Approach

Molnar, Christoph; König, Gunnar; Bischl, Bernd; Casalicchio, Giuseppe

doi:10.1007/s10618-022-00901-9

Statistics > Machine Learning

arXiv:2006.04628 (stat)

[Submitted on 8 Jun 2020 (v1), last revised 21 Jun 2021 (this version, v2)]

Title:Model-agnostic Feature Importance and Effects with Dependent Features -- A Conditional Subgroup Approach

Authors:Christoph Molnar, Gunnar König, Bernd Bischl, Giuseppe Casalicchio

View PDF

Abstract:The interpretation of feature importance in machine learning models is challenging when features are dependent. Permutation feature importance (PFI) ignores such dependencies, which can cause misleading interpretations due to extrapolation. A possible remedy is more advanced conditional PFI approaches that enable the assessment of feature importance conditional on all other features. Due to this shift in perspective and in order to enable correct interpretations, it is therefore important that the conditioning is transparent and humanly comprehensible. In this paper, we propose a new sampling mechanism for the conditional distribution based on permutations in conditional subgroups. As these subgroups are constructed using decision trees (transformation trees), the conditioning becomes inherently interpretable. This not only provides a simple and effective estimator of conditional PFI, but also local PFI estimates within the subgroups. In addition, we apply the conditional subgroups approach to partial dependence plots (PDP), a popular method for describing feature effects that can also suffer from extrapolation when features are dependent and interactions are present in the model. We show that PFI and PDP based on conditional subgroups often outperform methods such as conditional PFI based on knockoffs, or accumulated local effect plots. Furthermore, our approach allows for a more fine-grained interpretation of feature effects and importance within the conditional subgroups.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2006.04628 [stat.ML]
	(or arXiv:2006.04628v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2006.04628
Journal reference:	Data Mining and Knowledge Discovery (2023)
Related DOI:	https://doi.org/10.1007/s10618-022-00901-9

Submission history

From: Christoph Molnar [view email]
[v1] Mon, 8 Jun 2020 14:26:45 UTC (179 KB)
[v2] Mon, 21 Jun 2021 07:59:39 UTC (1,064 KB)

Statistics > Machine Learning

Title:Model-agnostic Feature Importance and Effects with Dependent Features -- A Conditional Subgroup Approach

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Model-agnostic Feature Importance and Effects with Dependent Features -- A Conditional Subgroup Approach

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators