Learning to Synthesize Programs as Interpretable and Generalizable Policies

Trivedi, Dweep; Zhang, Jesse; Sun, Shao-Hua; Lim, Joseph J.

Computer Science > Machine Learning

arXiv:2108.13643 (cs)

[Submitted on 31 Aug 2021 (v1), last revised 31 Jan 2022 (this version, v4)]

Title:Learning to Synthesize Programs as Interpretable and Generalizable Policies

Authors:Dweep Trivedi, Jesse Zhang, Shao-Hua Sun, Joseph J. Lim

View PDF

Abstract:Recently, deep reinforcement learning (DRL) methods have achieved impressive performance on tasks in a variety of domains. However, neural network policies produced with DRL methods are not human-interpretable and often have difficulty generalizing to novel scenarios. To address these issues, prior works explore learning programmatic policies that are more interpretable and structured for generalization. Yet, these works either employ limited policy representations (e.g. decision trees, state machines, or predefined program templates) or require stronger supervision (e.g. input/output state pairs or expert demonstrations). We present a framework that instead learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner, solely from reward signals. To alleviate the difficulty of learning to compose programs to induce the desired agent behavior from scratch, we propose to first learn a program embedding space that continuously parameterizes diverse behaviors in an unsupervised manner and then search over the learned program embedding space to yield a program that maximizes the return for a given task. Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines while producing interpretable and more generalizable policies. We also justify the necessity of the proposed two-stage learning scheme as well as analyze various methods for learning the program embedding.

Comments:	NeurIPS 2021. 53 pages, 16 figures, 12 tables. Website at this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
Cite as:	arXiv:2108.13643 [cs.LG]
	(or arXiv:2108.13643v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2108.13643

Submission history

From: Jesse Zhang [view email]
[v1] Tue, 31 Aug 2021 07:03:06 UTC (7,091 KB)
[v2] Wed, 27 Oct 2021 21:11:58 UTC (7,015 KB)
[v3] Tue, 2 Nov 2021 22:43:57 UTC (6,255 KB)
[v4] Mon, 31 Jan 2022 19:47:53 UTC (6,255 KB)

Computer Science > Machine Learning

Title:Learning to Synthesize Programs as Interpretable and Generalizable Policies

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Learning to Synthesize Programs as Interpretable and Generalizable Policies

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators