Embodied Question Answering via Multi-LLM Systems

B Patel, VS Dorbala, AS Bedi - arXiv preprint arXiv:2406.10918, 2024 - arxiv.org
arXiv preprint arXiv:2406.10918, 2024arxiv.org
Embodied Question Answering (EQA) is an important problem, which involves an agent
exploring the environment to answer user queries. In the existing literature, EQA has
exclusively been studied in single-agent scenarios, where exploration can be time-
consuming and costly. In this work, we consider EQA in a multi-agent framework involving
multiple large language models (LLM) based agents independently answering queries
about a household environment. To generate one answer for each query, we use the …
Embodied Question Answering (EQA) is an important problem, which involves an agent exploring the environment to answer user queries. In the existing literature, EQA has exclusively been studied in single-agent scenarios, where exploration can be time-consuming and costly. In this work, we consider EQA in a multi-agent framework involving multiple large language models (LLM) based agents independently answering queries about a household environment. To generate one answer for each query, we use the individual responses to train a Central Answer Model (CAM) that aggregates responses for a robust answer. Using CAM, we observe a higher EQA accuracy when compared against aggregation methods for ensemble LLM, such as voting schemes and debates. CAM does not require any form of agent communication, alleviating it from the associated costs. We ablate CAM with various nonlinear (neural network, random forest, decision tree, XGBoost) and linear (logistic regression classifier, SVM) algorithms. Finally, we present a feature importance analysis for CAM via permutation feature importance (PFI), quantifying CAMs reliance on each independent agent and query context.
arxiv.org