Doubly Distributed Supervised Learning and Inference with High-Dimensional Correlated Outcomes

Hector, Emily C.; Song, Peter X. -K.

Mathematics > Statistics Theory

arXiv:2007.08588 (math)

[Submitted on 16 Jul 2020]

Title:Doubly Distributed Supervised Learning and Inference with High-Dimensional Correlated Outcomes

Authors:Emily C. Hector, Peter X.-K. Song

View PDF

Abstract:This paper presents a unified framework for supervised learning and inference procedures using the divide-and-conquer approach for high-dimensional correlated outcomes. We propose a general class of estimators that can be implemented in a fully distributed and parallelized computational scheme. Modelling, computational and theoretical challenges related to high-dimensional correlated outcomes are overcome by dividing data at both outcome and subject levels, estimating the parameter of interest from blocks of data using a broad class of supervised learning procedures, and combining block estimators in a closed-form meta-estimator asymptotically equivalent to estimates obtained by Hansen (1982)'s generalized method of moments (GMM) that does not require the entire data to be reloaded on a common server. We provide rigorous theoretical justifications for the use of distributed estimators with correlated outcomes by studying the asymptotic behaviour of the combined estimator with fixed and diverging number of data divisions. Simulations illustrate the finite sample performance of the proposed method, and we provide an R package for ease of implementation.

Comments:	49 pages, 1 figure
Subjects:	Statistics Theory (math.ST)
Cite as:	arXiv:2007.08588 [math.ST]
	(or arXiv:2007.08588v1 [math.ST] for this version)
	https://doi.org/10.48550/arXiv.2007.08588
Journal reference:	Journal of Machine Learning Research, 21(173):1-35, 2020

Submission history

From: Emily C Hector [view email]
[v1] Thu, 16 Jul 2020 19:49:01 UTC (41 KB)

Mathematics > Statistics Theory

Title:Doubly Distributed Supervised Learning and Inference with High-Dimensional Correlated Outcomes

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Statistics Theory

Title:Doubly Distributed Supervised Learning and Inference with High-Dimensional Correlated Outcomes

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators