Balancing Approach for Causal Inference at Scale

Lin, Sicheng; Xu, Meng; Zhang, Xi; Chao, Shih-Kang; Huang, Ying-Kai; Shi, Xiaolin

Statistics > Methodology

arXiv:2302.05549 (stat)

[Submitted on 10 Feb 2023 (v1), last revised 3 Aug 2023 (this version, v2)]

Title:Balancing Approach for Causal Inference at Scale

Authors:Sicheng Lin, Meng Xu, Xi Zhang, Shih-Kang Chao, Ying-Kai Huang, Xiaolin Shi

View PDF

Abstract:With the modern software and online platforms to collect massive amount of data, there is an increasing demand of applying causal inference methods at large scale when randomized experimentation is not viable. Weighting methods that directly incorporate covariate balancing have recently gained popularity for estimating causal effects in observational studies. These methods reduce the manual efforts required by researchers to iterate between propensity score modeling and balance checking until a satisfied covariate balance result. However, conventional solvers for determining weights lack the scalability to apply such methods on large scale datasets in companies like Snap Inc. To address the limitations and improve computational efficiency, in this paper we present scalable algorithms, DistEB and DistMS, for two balancing approaches: entropy balancing and MicroSynth. The solvers have linear time complexity and can be conveniently implemented in distributed computing frameworks such as Spark, Hive, etc. We study the properties of balancing approaches at different scales up to 1 million treated units and 487 covariates. We find that with larger sample size, both bias and variance in the causal effect estimation are significantly reduced. The results emphasize the importance of applying balancing approaches on large scale datasets. We combine the balancing approach with a synthetic control framework and deploy an end-to-end system for causal impact estimation at Snap Inc.

Comments:	KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Subjects:	Methodology (stat.ME); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2302.05549 [stat.ME]
	(or arXiv:2302.05549v2 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.2302.05549

Submission history

From: Sicheng Lin [view email]
[v1] Fri, 10 Feb 2023 23:32:47 UTC (2,021 KB)
[v2] Thu, 3 Aug 2023 04:19:50 UTC (2,022 KB)

Statistics > Methodology

Title:Balancing Approach for Causal Inference at Scale

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Methodology

Title:Balancing Approach for Causal Inference at Scale

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators