High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

Moses, William S.; Ivanov, Ivan R.; Domke, Jens; Endo, Toshio; Doerfert, Johannes; Zinenko, Oleksandr

Computer Science > Programming Languages

arXiv:2207.00257 (cs)

[Submitted on 1 Jul 2022]

Title:High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

Authors:William S. Moses, Ivan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert, Oleksandr Zinenko

View PDF

Abstract:While parallelism remains the main source of performance, architectural implementations and programming models change with each new hardware generation, often leading to costly application re-engineering. Most tools for performance portability require manual and costly application porting to yet another programming model.
We propose an alternative approach that automatically translates programs written in one programming model (CUDA), into another (CPU threads) based on Polygeist/MLIR. Our approach includes a representation of parallel constructs that allows conventional compiler transformations to apply transparently and without modification and enables parallelism-specific optimizations. We evaluate our framework by transpiling and optimizing the CUDA Rodinia benchmark suite for a multi-core CPU and achieve a 76% geomean speedup over handwritten OpenMP code. Further, we show how CUDA kernels from PyTorch can efficiently run and scale on the CPU-only Supercomputer Fugaku without user intervention. Our PyTorch compatibility layer making use of transpiled CUDA PyTorch kernels outperforms the PyTorch CPU native backend by 2.7$\times$.

Subjects:	Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2207.00257 [cs.PL]
	(or arXiv:2207.00257v1 [cs.PL] for this version)
	https://doi.org/10.48550/arXiv.2207.00257

Submission history

From: Oleksandr Zinenko [view email]
[v1] Fri, 1 Jul 2022 08:20:50 UTC (3,245 KB)

Computer Science > Programming Languages

Title:High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Programming Languages

Title:High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators