OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling

Lu, Hongliang; Xie, Zhonglin; Wu, Yaoyu; Ren, Can; Chen, Yuxuan; Wen, Zaiwen

Computer Science > Artificial Intelligence

arXiv:2502.11102 (cs)

[Submitted on 16 Feb 2025]

Title:OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling

Authors:Hongliang Lu, Zhonglin Xie, Yaoyu Wu, Can Ren, Yuxuan Chen, Zaiwen Wen

View PDF

Abstract:Despite the rapid development of large language models (LLMs), a fundamental challenge persists: the lack of high-quality optimization modeling datasets hampers LLMs' robust modeling of practical optimization problems from natural language descriptions (NL). This data scarcity also contributes to the generalization difficulties experienced by learning-based methods. To address these challenges, we propose a scalable framework for synthesizing a high-quality dataset, named OptMATH. Starting from curated seed data with mathematical formulations (MF), this framework automatically generates problem data (PD) with controllable complexity. Then, a back-translation step is employed to obtain NL. To verify the correspondence between the NL and the PD, a forward modeling step followed by rejection sampling is used. The accepted pairs constitute the training part of OptMATH. Then a collection of rejected pairs is identified and further filtered. This collection serves as a new benchmark for optimization modeling, containing difficult instances whose lengths are much longer than these of NL4OPT and MAMO. Through extensive experiments, we demonstrate that models of various sizes (0.5B-32B parameters) trained on OptMATH achieve superior results on multiple modeling benchmarks, thereby validating the effectiveness and scalability of our approach.

Comments:	This paper has 36 pages, 18 figures, and two co-first authors: Hongliang Lu and Zhonglin Xie
Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2502.11102 [cs.AI]
	(or arXiv:2502.11102v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2502.11102

Submission history

From: Zhonglin Xie [view email]
[v1] Sun, 16 Feb 2025 12:38:37 UTC (11,781 KB)

Computer Science > Artificial Intelligence

Title:OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators