Training Agents with Weakly Supervised Feedback from Large Language Models

Gong, Dihong; Lu, Pu; Wang, Zelong; Zhou, Meng; He, Xiuqiang

Computer Science > Computation and Language

arXiv:2411.19547 (cs)

[Submitted on 29 Nov 2024]

Title:Training Agents with Weakly Supervised Feedback from Large Language Models

Authors:Dihong Gong, Pu Lu, Zelong Wang, Meng Zhou, Xiuqiang He

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) offer a promising basis for creating agents that can tackle complex tasks through iterative environmental interaction. Existing methods either require these agents to mimic expert-provided trajectories or rely on definitive environmental feedback for reinforcement learning which limits their application to specific scenarios like gaming or code generation. This paper introduces a novel training method for LLM-based agents using weakly supervised signals from a critic LLM, bypassing the need for expert trajectories or definitive feedback. Our agents are trained in iterative manner, where they initially generate trajectories through environmental interaction. Subsequently, a critic LLM selects a subset of good trajectories, which are then used to update the agents, enabling them to generate improved trajectories in the next iteration. Extensive tests on the API-bank dataset show consistent improvement in our agents' capabilities and comparable performance to GPT-4, despite using open-source models with much fewer parameters.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2411.19547 [cs.CL]
	(or arXiv:2411.19547v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2411.19547

Submission history

From: Dihong Gong [view email]
[v1] Fri, 29 Nov 2024 08:47:04 UTC (514 KB)

Computer Science > Computation and Language

Title:Training Agents with Weakly Supervised Feedback from Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Training Agents with Weakly Supervised Feedback from Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators