SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors

Lyu, Bohan; Huang, Siqiao; Liang, Zichen

Abstract:Large language models (LLMs) have demonstrated remarkable capabilities in code-related tasks, such as code understanding and code generation. However, an equally important yet underexplored question is whether LLMs can serve as general-purpose surrogate code executors, to predict the output and behavior of a program without actually running it. To systematically investigate this capability, we introduce SURGE, a comprehensive benchmark covering eight key aspects: multi-language programming tasks, competition-level programming problems, repository-level code analysis, high-cost scientific computing, time-complexity-intensive algorithms, buggy code analysis, programs dependent on specific compilers or execution environments, and formal mathematical proof verification. We evaluate multiple open-source and proprietary LLMs on SURGE and conduct a scaling study to analyze the impact of model size and training data scale on surrogate execution accuracy. Additionally, we categorize model prediction errors and explore potential areas for improvement. Our findings indicate that while LLMs can predict code execution results in certain cases, they exhibit limitations in general-purpose surrogate execution. This study provides empirical insights into the feasibility of using LLMs as surrogate code executors. Code and dataset are released at this https URL.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2502.11167 [cs.LG]
	(or arXiv:2502.11167v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.11167

Computer Science > Machine Learning

Title:SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators