Comparative Study of Large Language Model Architectures on Frontier

Yin, Junqi; Bose, Avishek; Cong, Guojing; Lyngaas, Isaac; Anthony, Quentin

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2402.00691 (cs)

[Submitted on 1 Feb 2024]

Title:Comparative Study of Large Language Model Architectures on Frontier

Authors:Junqi Yin, Avishek Bose, Guojing Cong, Isaac Lyngaas, Quentin Anthony

View PDF

Abstract:Large language models (LLMs) have garnered significant attention in both the AI community and beyond. Among these, the Generative Pre-trained Transformer (GPT) has emerged as the dominant architecture, spawning numerous variants. However, these variants have undergone pre-training under diverse conditions, including variations in input data, data preprocessing, and training methodologies, resulting in a lack of controlled comparative studies. Here we meticulously examine two prominent open-sourced GPT architectures, GPT-NeoX and LLaMA, leveraging the computational power of Frontier, the world's first Exascale supercomputer. Employing the same materials science text corpus and a comprehensive end-to-end pipeline, we conduct a comparative analysis of their training and downstream performance. Our efforts culminate in achieving state-of-the-art performance on a challenging materials science benchmark. Furthermore, we investigate the computation and energy efficiency, and propose a computationally efficient method for architecture design. To our knowledge, these pre-trained models represent the largest available for materials science. Our findings provide practical guidance for building LLMs on HPC platforms.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2402.00691 [cs.DC]
	(or arXiv:2402.00691v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2402.00691

Submission history

From: Junqi Yin [view email]
[v1] Thu, 1 Feb 2024 15:50:37 UTC (13,693 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Comparative Study of Large Language Model Architectures on Frontier

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Comparative Study of Large Language Model Architectures on Frontier

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators