Assisting Mathematical Formalization with A Learning-based Premise Retriever

Tao, Yicheng; Liu, Haotian; Wang, Shanwen; Xu, Hongteng

Computer Science > Computation and Language

arXiv:2501.13959 (cs)

[Submitted on 21 Jan 2025]

Title:Assisting Mathematical Formalization with A Learning-based Premise Retriever

Authors:Yicheng Tao, Haotian Liu, Shanwen Wang, Hongteng Xu

View PDF HTML (experimental)

Abstract:Premise selection is a crucial yet challenging step in mathematical formalization, especially for users with limited experience. Due to the lack of available formalization projects, existing approaches that leverage language models often suffer from data scarcity. In this work, we introduce an innovative method for training a premise retriever to support the formalization of mathematics. Our approach employs a BERT model to embed proof states and premises into a shared latent space. The retrieval model is trained within a contrastive learning framework and incorporates a domain-specific tokenizer along with a fine-grained similarity computation method. Experimental results show that our model is highly competitive compared to existing baselines, achieving strong performance while requiring fewer computational resources. Performance is further enhanced through the integration of a re-ranking module. To streamline the formalization process, we will release a search engine that enables users to query Mathlib theorems directly using proof states, significantly improving accessibility and efficiency. Codes are available at this https URL.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Cite as:	arXiv:2501.13959 [cs.CL]
	(or arXiv:2501.13959v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2501.13959

Submission history

From: Yicheng Tao [view email]
[v1] Tue, 21 Jan 2025 06:32:25 UTC (1,853 KB)

Computer Science > Computation and Language

Title:Assisting Mathematical Formalization with A Learning-based Premise Retriever

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Assisting Mathematical Formalization with A Learning-based Premise Retriever

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators