Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning

Lu, Pan; Qiu, Liang; Chang, Kai-Wei; Wu, Ying Nian; Zhu, Song-Chun; Rajpurohit, Tanmay; Clark, Peter; Kalyan, Ashwin

Computer Science > Machine Learning

arXiv:2209.14610 (cs)

[Submitted on 29 Sep 2022 (v1), last revised 2 Mar 2023 (this version, v3)]

Title:Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning

Authors:Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark, Ashwin Kalyan

View PDF

Abstract:Mathematical reasoning, a core ability of human intelligence, presents unique challenges for machines in abstract thinking and logical reasoning. Recent large pre-trained language models such as GPT-3 have achieved remarkable progress on mathematical reasoning tasks written in text form, such as math word problems (MWP). However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data. To fill the gap, we present Tabular Math Word Problems (TabMWP), a new dataset containing 38,431 open-domain grade-level problems that require mathematical reasoning on both textual and tabular data. Each question in TabMWP is aligned with a tabular context, which is presented as an image, semi-structured text, and a structured table. There are two types of questions: free-text and multi-choice, and each problem is annotated with gold solutions to reveal the multi-step reasoning process. We evaluate different pre-trained models on TabMWP, including the GPT-3 model in a few-shot setting. As earlier studies suggest, since few-shot GPT-3 relies on the selection of in-context examples, its performance is unstable and can degrade to near chance. The unstable issue is more severe when handling complex problems like TabMWP. To mitigate this, we further propose a novel approach, PromptPG, which utilizes policy gradient to learn to select in-context examples from a small amount of training data and then constructs the corresponding prompt for the test example. Experimental results show that our method outperforms the best baseline by 5.31% on the accuracy metric and reduces the prediction variance significantly compared to random selection, which verifies its effectiveness in selecting in-context examples.

Comments:	ICLR 2023. 26 pages and 18 figures. The data and code are available at this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2209.14610 [cs.LG]
	(or arXiv:2209.14610v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2209.14610

Submission history

From: Pan Lu [view email]
[v1] Thu, 29 Sep 2022 08:01:04 UTC (2,085 KB)
[v2] Wed, 2 Nov 2022 23:42:14 UTC (2,080 KB)
[v3] Thu, 2 Mar 2023 07:41:55 UTC (2,588 KB)

Computer Science > Machine Learning

Title:Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators