NeRCC: Nested-Regression Coded Computing for Resilient Distributed Prediction Serving Systems

Moradi, Parsa; Maddah-Ali, Mohammad Ali

Computer Science > Machine Learning

arXiv:2402.04377 (cs)

[Submitted on 6 Feb 2024 (v1), last revised 8 Feb 2024 (this version, v2)]

Title:NeRCC: Nested-Regression Coded Computing for Resilient Distributed Prediction Serving Systems

Authors:Parsa Moradi, Mohammad Ali Maddah-Ali

View PDF HTML (experimental)

Abstract:Resilience against stragglers is a critical element of prediction serving systems, tasked with executing inferences on input data for a pre-trained machine-learning model. In this paper, we propose NeRCC, as a general straggler-resistant framework for approximate coded computing. NeRCC includes three layers: (1) encoding regression and sampling, which generates coded data points, as a combination of original data points, (2) computing, in which a cluster of workers run inference on the coded data points, (3) decoding regression and sampling, which approximately recovers the predictions of the original data points from the available predictions on the coded data points. We argue that the overall objective of the framework reveals an underlying interconnection between two regression models in the encoding and decoding layers. We propose a solution to the nested regressions problem by summarizing their dependence on two regularization terms that are jointly optimized. Our extensive experiments on different datasets and various machine learning models, including LeNet5, RepVGG, and Vision Transformer (ViT), demonstrate that NeRCC accurately approximates the original predictions in a wide range of stragglers, outperforming the state-of-the-art by up to 23%.

Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)
Cite as:	arXiv:2402.04377 [cs.LG]
	(or arXiv:2402.04377v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.04377

Submission history

From: Parsa Moradi [view email]
[v1] Tue, 6 Feb 2024 20:31:15 UTC (970 KB)
[v2] Thu, 8 Feb 2024 23:15:10 UTC (970 KB)

Computer Science > Machine Learning

Title:NeRCC: Nested-Regression Coded Computing for Resilient Distributed Prediction Serving Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:NeRCC: Nested-Regression Coded Computing for Resilient Distributed Prediction Serving Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators