tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads

Chien, Steven W. D.; Podobas, Artur; Peng, Ivy B.; Markidis, Stefano

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2008.04395v1 (cs)

[Submitted on 10 Aug 2020 (this version), latest version 12 Aug 2020 (v2)]

Title:tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads

Authors:Steven W. D. Chien, Artur Podobas, Ivy B. Peng, Stefano Markidis

View PDF

Abstract:Machine Learning applications on HPC systems have been gaining popularity in recent years. The upcoming large scale systems will offer tremendous parallelism for training through GPUs. However, another heavy aspect of Machine Learning is I/O, and this can potentially be a performance bottleneck. TensorFlow, one of the most popular Deep-Learning platforms, now offers a new profiler interface and allows instrumentation of TensorFlow operations. However, the current profiler only enables analysis at the TensorFlow platform level and does not provide system-level information. In this paper, we extend TensorFlow Profiler and introduce tf-Darshan, both a profiler and tracer, that performs instrumentation through Darshan. We use the same Darshan shared instrumentation library and implement a runtime attachment without using a system preload. We can extract Darshan profiling data structures during TensorFlow execution to enable analysis through the TensorFlow profiler. We visualize the performance results through TensorBoard, the web-based TensorFlow visualization tool. At the same time, we do not alter Darshan's existing implementation. We illustrate tf-Darshan by performing two case studies on ImageNet image and Malware classification. We show that by guiding optimization using data from tf-Darshan, we increase POSIX I/O bandwidth by up to 19% by selecting data for staging on fast tier storage. We also show that Darshan has the potential of being used as a runtime library for profiling and providing information for future optimization.

Comments:	Accepted for publication at the 2019 International Conference on Cluster Computing (CLUSTER)
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2008.04395 [cs.DC]
	(or arXiv:2008.04395v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2008.04395

Submission history

From: Steven W. D. Chien [view email]
[v1] Mon, 10 Aug 2020 20:09:09 UTC (1,395 KB)
[v2] Wed, 12 Aug 2020 00:40:35 UTC (1,395 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators