Can Large Language Models Understand Intermediate Representations?

Jiang, Hailong; Zhu, Jianfeng; Wan, Yao; Fang, Bo; Zhang, Hongyu; Jin, Ruoming; Guan, Qiang

Computer Science > Machine Learning

arXiv:2502.06854 (cs)

[Submitted on 7 Feb 2025]

Title:Can Large Language Models Understand Intermediate Representations?

Authors:Hailong Jiang, Jianfeng Zhu, Yao Wan, Bo Fang, Hongyu Zhang, Ruoming Jin, Qiang Guan

View PDF HTML (experimental)

Abstract:Intermediate Representations (IRs) are essential in compiler design and program analysis, yet their comprehension by Large Language Models (LLMs) remains underexplored. This paper presents a pioneering empirical study to investigate the capabilities of LLMs, including GPT-4, GPT-3, Gemma 2, LLaMA 3.1, and Code Llama, in understanding IRs. We analyze their performance across four tasks: Control Flow Graph (CFG) reconstruction, decompilation, code summarization, and execution reasoning. Our results indicate that while LLMs demonstrate competence in parsing IR syntax and recognizing high-level structures, they struggle with control flow reasoning, execution semantics, and loop handling. Specifically, they often misinterpret branching instructions, omit critical IR operations, and rely on heuristic-based reasoning, leading to errors in CFG reconstruction, IR decompilation, and execution reasoning. The study underscores the necessity for IR-specific enhancements in LLMs, recommending fine-tuning on structured IR datasets and integration of explicit control flow models to augment their comprehension and handling of IR-related tasks.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2502.06854 [cs.LG]
	(or arXiv:2502.06854v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.06854

Submission history

From: Hailong Jiang [view email]
[v1] Fri, 7 Feb 2025 17:23:48 UTC (768 KB)

Computer Science > Machine Learning

Title:Can Large Language Models Understand Intermediate Representations?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Can Large Language Models Understand Intermediate Representations?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators