Learning Autonomous Code Integration for Math Language Models

Wang, Haozhe; Li, Long; Qu, Chao; Zhu, Fengming; Xu, Weidi; Chu, Wei; Lin, Fangzhen

Computer Science > Artificial Intelligence

arXiv:2502.00691 (cs)

[Submitted on 2 Feb 2025 (v1), last revised 16 Feb 2025 (this version, v2)]

Title:Learning Autonomous Code Integration for Math Language Models

Authors:Haozhe Wang, Long Li, Chao Qu, Fengming Zhu, Weidi Xu, Wei Chu, Fangzhen Lin

View PDF HTML (experimental)

Abstract:Recent advances in mathematical problem-solving with language models (LMs) integrate chain-of-thought (CoT) reasoning and code execution to harness their complementary strengths. However, existing hybrid frameworks exhibit a critical limitation: they depend on externally dictated instructions or rigid code-integration templates, lacking metacognitive awareness -- the capacity to dynamically evaluate intrinsic capabilities and autonomously determine when and how to integrate tools. This rigidity motivates our study of autonomous code integration, enabling models to adapt tool-usage strategies as their reasoning abilities evolve during training.
While reinforcement learning (RL) shows promise for boosting LLM reasoning at scale (e.g., DeepSeek-R1), we demonstrate its inefficiency in learning autonomous code integration due to inadequate exploration of the vast combinatorial space of CoT-code interleaving patterns. To address this challenge, we propose a novel Expectation-Maximization (EM) framework that synergizes structured exploration (E-step) with off-policy RL optimization (M-step), creating a self-reinforcing cycle between metacognitive tool-use decisions and evolving capabilities. Experiments reveal our method achieves superior results through improved exploration. Notably, our 7B model improves over 11% on MATH500 and 9.4% on AIME without o1-like CoT.

Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2502.00691 [cs.AI]
	(or arXiv:2502.00691v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2502.00691

Submission history

From: Haozhe Wang [view email]
[v1] Sun, 2 Feb 2025 06:32:23 UTC (3,374 KB)
[v2] Sun, 16 Feb 2025 07:18:23 UTC (627 KB)

Computer Science > Artificial Intelligence

Title:Learning Autonomous Code Integration for Math Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Learning Autonomous Code Integration for Math Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators