Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning

Frieder, Simon; Bayer, Jonas; Collins, Katherine M.; Berner, Julius; Loader, Jacob; Juhász, András; Ruehle, Fabian; Welleck, Sean; Poesia, Gabriel; Griffiths, Ryan-Rhys; Weller, Adrian; Goyal, Anirudh; Lukasiewicz, Thomas; Gowers, Timothy

Abstract:The suite of datasets commonly used to train and evaluate the mathematical capabilities of AI-based mathematical copilots (primarily large language models) exhibit several shortcomings. These limitations include a restricted scope of mathematical complexity, typically not exceeding lower undergraduate-level mathematics, binary rating protocols and other issues, which makes comprehensive proof-based evaluation suites difficult. We systematically explore these limitations and contend that enhancing the capabilities of large language models, or any forthcoming advancements in AI-based mathematical assistants (copilots or "thought partners"), necessitates a paradigm shift in the design of mathematical datasets and the evaluation criteria of mathematical ability: It is necessary to move away from result-based datasets (theorem statement to theorem proof) and convert the rich facets of mathematical research practice to data LLMs can train on. Examples of these are mathematical workflows (sequences of atomic, potentially subfield-dependent tasks that are often performed when creating new mathematics), which are an important part of the proof-discovery process. Additionally, we advocate for mathematical dataset developers to consider the concept of "motivated proof", introduced by G. Pólya in 1949, which can serve as a blueprint for datasets that offer a better proof learning signal, alleviating some of the mentioned limitations. Lastly, we introduce math datasheets for datasets, extending the general, dataset-agnostic variants of datasheets: We provide a questionnaire designed specifically for math datasets that we urge dataset creators to include with their datasets. This will make creators aware of potential limitations of their datasets while at the same time making it easy for readers to assess it from the point of view of training and evaluating mathematical copilots.

Comments:	40 pages
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2412.15184 [cs.LG]
	(or arXiv:2412.15184v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.15184

Computer Science > Machine Learning

Title:Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators