Central Answer Modeling for an Embodied Multi-LLM System

Patel, Bhrij; Dorbala, Vishnu Sashank; Bedi, Amrit Singh; Manocha, Dinesh

Computer Science > Machine Learning

arXiv:2406.10918v4 (cs)

[Submitted on 16 Jun 2024 (v1), revised 16 Sep 2024 (this version, v4), latest version 18 Oct 2024 (v5)]

Title:Central Answer Modeling for an Embodied Multi-LLM System

Authors:Bhrij Patel, Vishnu Sashank Dorbala, Amrit Singh Bedi, Dinesh Manocha

View PDF HTML (experimental)

Abstract:Embodied Question Answering (EQA) is an important problem, which involves an agent exploring the environment to answer user queries. In the existing literature, EQA has exclusively been studied in single-agent scenarios, where exploration can be time-consuming and costly. In this work, we consider EQA in a multi-agent framework involving multiple large language models (LLM) based agents independently answering queries about a household environment. To generate one answer for each query, we use the individual responses to train a Central Answer Model (CAM) that aggregates responses for a robust answer. While prior Question Answering (QA) work has used a central module based on answers from multiple LLM-based experts, we specifically look at applying this framework to embodied LLM-based agents that must physically explore the environment first to become experts on their given environment to answer questions. Our work is the first to utilize a central answer model framework with embodied agents that must rely on exploring an unknown environment. We set up a variation of EQA where instead of the agents exploring the environment after the question is asked, the agents first explore the environment for a set amount of time and then answer a set of queries. Using CAM, we observe a $46\%$ higher EQA accuracy when compared against aggregation methods for ensemble LLM, such as voting schemes and debates. CAM does not require any form of agent communication, alleviating it from the associated costs. We ablate CAM with various nonlinear (neural network, random forest, decision tree, XGBoost) and linear (logistic regression classifier, SVM) algorithms. We experiment in various topological graph environments and examine the case where one of the agents is malicious and purposes contribute responses it believes to be wrong.

Comments:	15 pages, 11 Figures, 5 Tables
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2406.10918 [cs.LG]
	(or arXiv:2406.10918v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.10918

Submission history

From: Bhrij Patel [view email]
[v1] Sun, 16 Jun 2024 12:46:40 UTC (4,384 KB)
[v2] Tue, 18 Jun 2024 01:18:46 UTC (4,384 KB)
[v3] Tue, 25 Jun 2024 10:50:09 UTC (4,863 KB)
[v4] Mon, 16 Sep 2024 07:12:12 UTC (1,819 KB)
[v5] Fri, 18 Oct 2024 12:27:07 UTC (5,821 KB)

Computer Science > Machine Learning

Title:Central Answer Modeling for an Embodied Multi-LLM System

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Central Answer Modeling for an Embodied Multi-LLM System

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators