Whale: Efficient Giant Model Training over Heterogeneous GPUs

Jia, Xianyan; Jiang, Le; Wang, Ang; Xiao, Wencong; Shi, Ziji; Zhang, Jie; Li, Xinyuan; Chen, Langshi; Li, Yong; Zheng, Zhen; Liu, Xiaoyong; Lin, Wei

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2011.09208 (cs)

[Submitted on 18 Nov 2020 (v1), last revised 6 Jun 2022 (this version, v3)]

Title:Whale: Efficient Giant Model Training over Heterogeneous GPUs

Authors:Xianyan Jia, Le Jiang, Ang Wang, Wencong Xiao, Ziji Shi, Jie Zhang, Xinyuan Li, Langshi Chen, Yong Li, Zhen Zheng, Xiaoyong Liu, Wei Lin

View PDF

Abstract:The scaling up of deep neural networks has been demonstrated to be effective in improving model quality, but also encompasses several training challenges in terms of training efficiency, programmability, and resource adaptability. We present Whale, a general and efficient distributed training framework for giant models. To support various parallel strategies and their hybrids, Whale generalizes the programming interface by defining two new primitives in the form of model annotations, allowing for incorporating user hints. The Whale runtime utilizes those annotations and performs graph optimizations to transform a local deep learning DAG graph for distributed multi-GPU execution. Whale further introduces a novel hardware-aware parallel strategy, which improves the performance of model training on heterogeneous GPUs in a balanced manner. Deployed in a production cluster with 512 GPUs, Whale successfully trains an industry-scale multimodal model with over ten trillion model parameters, named M6, demonstrating great scalability and efficiency.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2011.09208 [cs.DC]
	(or arXiv:2011.09208v3 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2011.09208

Submission history

From: Xianyan Jia [view email]
[v1] Wed, 18 Nov 2020 10:54:31 UTC (487 KB)
[v2] Wed, 18 Aug 2021 12:49:12 UTC (4,597 KB)
[v3] Mon, 6 Jun 2022 13:20:13 UTC (5,680 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Whale: Efficient Giant Model Training over Heterogeneous GPUs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Whale: Efficient Giant Model Training over Heterogeneous GPUs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators