$\texttt{causalAssembly}$: Generating Realistic Production Data for Benchmarking Causal Discovery

Göbler, Konstantin; Windisch, Tobias; Drton, Mathias; Pychynski, Tim; Sonntag, Steffen; Roth, Martin

Statistics > Machine Learning

arXiv:2306.10816 (stat)

[Submitted on 19 Jun 2023 (v1), last revised 14 Feb 2024 (this version, v2)]

Title:$\texttt{causalAssembly}$: Generating Realistic Production Data for Benchmarking Causal Discovery

Authors:Konstantin Göbler, Tobias Windisch, Mathias Drton, Tim Pychynski, Steffen Sonntag, Martin Roth

View PDF HTML (experimental)

Abstract:Algorithms for causal discovery have recently undergone rapid advances and increasingly draw on flexible nonparametric methods to process complex data. With these advances comes a need for adequate empirical validation of the causal relationships learned by different algorithms. However, for most real data sources true causal relations remain unknown. This issue is further compounded by privacy concerns surrounding the release of suitable high-quality data. To help address these challenges, we gather a complex dataset comprising measurements from an assembly line in a manufacturing context. This line consists of numerous physical processes for which we are able to provide ground truth causal relationships on the basis of a detailed study of the underlying physics. We use the assembly line data and associated ground truth information to build a system for generation of semisynthetic manufacturing data that supports benchmarking of causal discovery methods. To accomplish this, we employ distributional random forests in order to flexibly estimate and represent conditional distributions that may be combined into joint distributions that strictly adhere to a causal model over the observed variables. The estimated conditionals and tools for data generation are made available in our Python library $\texttt{causalAssembly}$. Using the library, we showcase how to benchmark several well-known causal discovery algorithms.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
Cite as:	arXiv:2306.10816 [stat.ML]
	(or arXiv:2306.10816v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2306.10816

Submission history

From: Konstantin Göbler [view email]
[v1] Mon, 19 Jun 2023 10:05:54 UTC (499 KB)
[v2] Wed, 14 Feb 2024 17:45:54 UTC (876 KB)

Statistics > Machine Learning

Title:$\texttt{causalAssembly}$: Generating Realistic Production Data for Benchmarking Causal Discovery

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:$\texttt{causalAssembly}$: Generating Realistic Production Data for Benchmarking Causal Discovery

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators