rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent Tasks

Mohammed, Ali; Cavelan, Aurelien; Ciorba, Florina M.

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:1905.08073 (cs)

[Submitted on 20 May 2019 (v1), last revised 4 Oct 2019 (this version, v3)]

Title:rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent Tasks

Authors:Ali Mohammed, Aurelien Cavelan, Florina M. Ciorba

View PDF

Abstract:Scientific applications often contain large and computationally intensive parallel loops. Dynamic loop self scheduling (DLS) is used to achieve a balanced load execution of such applications on high performance computing (HPC) systems. Large HPC systems are vulnerable to processors or node failures and perturbations in the availability of resources. Most self-scheduling approaches do not consider fault-tolerant scheduling or depend on failure or perturbation detection and react by rescheduling failed tasks. In this work, a robust dynamic load balancing (rDLB) approach is proposed for the robust self scheduling of independent tasks. The proposed approach is proactive and does not depend on failure or perturbation detection. The theoretical analysis of the proposed approach shows that it is linearly scalable and its cost decrease quadratically by increasing the system size. rDLB is integrated into an MPI DLS library to evaluate its performance experimentally with two computationally intensive scientific applications. Results show that rDLB enables the tolerance of up to (P minus one) processor failures, where P is the number of processors executing an application. In the presence of perturbations, rDLB boosted the robustness of DLS techniques up to 30 times and decreased application execution time up to 7 times compared to their counterparts without rDLB.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Cite as:	arXiv:1905.08073 [cs.DC]
	(or arXiv:1905.08073v3 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.1905.08073

Submission history

From: Ali Mohammed [view email]
[v1] Mon, 20 May 2019 13:01:09 UTC (579 KB)
[v2] Wed, 22 May 2019 10:37:12 UTC (583 KB)
[v3] Fri, 4 Oct 2019 15:30:44 UTC (583 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent Tasks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent Tasks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators