Recomputation Enabled Efficient Checkpointing

Akturk, Ismail; Karpuzcu, Ulya R.

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:1710.04685 (cs)

[Submitted on 18 Sep 2017 (v1), last revised 21 Mar 2018 (this version, v2)]

Title:Recomputation Enabled Efficient Checkpointing

Authors:Ismail Akturk, Ulya R. Karpuzcu

View PDF

Abstract:Systematic checkpointing of the machine state makes restart of execution from a safe state possible upon detection of an error. The time and energy overhead of checkpointing, however, grows with the frequency of checkpointing. Amortizing this overhead becomes especially challenging, considering the growth of expected error rates, as checkpointing frequency tends to increase with increasing error rates. Based on the observation that due to imbalanced technology scaling, recomputing a data value can be more energy efficient than retrieving (i.e., loading) a stored copy, this paper explores how recomputation of data values (which otherwise would be read from a checkpoint from memory or secondary storage) can reduce the machine state to be checkpointed, and thereby reduce the checkpointing overhead. Specifically, the resulting amnesic checkpointing framework AmnesiCHK can reduce the storage overhead by up to 23.91%; time overhead, by 11.92%; and energy overhead, by 12.53%, respectively, even in a relatively small scale system.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:1710.04685 [cs.DC]
	(or arXiv:1710.04685v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.1710.04685

Submission history

From: Ismail Akturk [view email]
[v1] Mon, 18 Sep 2017 21:35:09 UTC (8,072 KB)
[v2] Wed, 21 Mar 2018 22:20:32 UTC (128 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.DC

< prev | next >

new | recent | 2017-10

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Ismail Akturk
Ulya R. Karpuzcu

export BibTeX citation

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Recomputation Enabled Efficient Checkpointing

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Recomputation Enabled Efficient Checkpointing

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators