-
Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information
Authors:
Yauwai Yim,
Chunkit Chan,
Tianyu Shi,
Zheye Deng,
Wei Fan,
Tianshi Zheng,
Yangqiu Song
Abstract:
Large language models (LLMs) have shown success in handling simple games with imperfect information and enabling multi-agent coordination, but their ability to facilitate practical collaboration against other agents in complex, imperfect information environments, especially in a non-English environment, still needs to be explored. This study investigates the applicability of knowledge acquired by…
▽ More
Large language models (LLMs) have shown success in handling simple games with imperfect information and enabling multi-agent coordination, but their ability to facilitate practical collaboration against other agents in complex, imperfect information environments, especially in a non-English environment, still needs to be explored. This study investigates the applicability of knowledge acquired by open-source and API-based LLMs to sophisticated text-based games requiring agent collaboration under imperfect information, comparing their performance to established baselines using other types of agents. We propose a Theory of Mind (ToM) planning technique that allows LLM agents to adapt their strategy against various adversaries using only game rules, current state, and historical context as input. An external tool was incorporated to mitigate the challenge of dynamic and extensive action spaces in this card game. Our results show that although a performance gap exists between current LLMs and state-of-the-art reinforcement learning (RL) models, LLMs demonstrate ToM capabilities in this game setting. It consistently improves their performance against opposing agents, suggesting their ability to understand the actions of allies and adversaries and establish collaboration with allies. To encourage further research and understanding, we have made our codebase openly accessible.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge
Authors:
Tianshi Zheng,
Jiaxin Bai,
Yicheng Wang,
Tianqing Fang,
Yue Guo,
Yauwai Yim,
Yangqiu Song
Abstract:
While large language models (LLMs) have demonstrated impressive capabilities across various natural language processing tasks by acquiring rich factual knowledge from their broad training data, their ability to synthesize and logically reason with this knowledge in complex ways remains underexplored. In this work, we present a systematic evaluation of state-of-the-art LLMs' complex logical reasoni…
▽ More
While large language models (LLMs) have demonstrated impressive capabilities across various natural language processing tasks by acquiring rich factual knowledge from their broad training data, their ability to synthesize and logically reason with this knowledge in complex ways remains underexplored. In this work, we present a systematic evaluation of state-of-the-art LLMs' complex logical reasoning abilities through a novel benchmark of automatically generated complex reasoning questions over general domain and biomedical knowledge graphs. Our extensive experiments, employing diverse in-context learning techniques, reveal that LLMs excel at reasoning over general world knowledge but face significant challenges with specialized domain-specific knowledge. We find that prompting with explicit Chain-of-Thought demonstrations can substantially improve LLM performance on complex logical reasoning tasks with diverse logical operations. Interestingly, our controlled evaluations uncover an asymmetry where LLMs display proficiency at set union operations, but struggle considerably with set intersections - a key building block of logical reasoning. To foster further work, we will publicly release our evaluation benchmark and code.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction
Authors:
Zheye Deng,
Chunkit Chan,
Weiqi Wang,
Yuxi Sun,
Wei Fan,
Tianshi Zheng,
Yauwai Yim,
Yangqiu Song
Abstract:
The task of condensing large chunks of textual information into concise and structured tables has gained attention recently due to the emergence of Large Language Models (LLMs) and their potential benefit for downstream tasks, such as text summarization and text mining. Previous approaches often generate tables that directly replicate information from the text, limiting their applicability in broa…
▽ More
The task of condensing large chunks of textual information into concise and structured tables has gained attention recently due to the emergence of Large Language Models (LLMs) and their potential benefit for downstream tasks, such as text summarization and text mining. Previous approaches often generate tables that directly replicate information from the text, limiting their applicability in broader contexts, as text-to-table generation in real-life scenarios necessitates information extraction, reasoning, and integration. However, there is a lack of both datasets and methodologies towards this task. In this paper, we introduce LiveSum, a new benchmark dataset created for generating summary tables of competitions based on real-time commentary texts. We evaluate the performances of state-of-the-art LLMs on this task in both fine-tuning and zero-shot settings, and additionally propose a novel pipeline called $T^3$(Text-Tuple-Table) to improve their performances. Extensive experimental results demonstrate that LLMs still struggle with this task even after fine-tuning, while our approach can offer substantial performance gains without explicit training. Further analyses demonstrate that our method exhibits strong generalization abilities, surpassing previous approaches on several other text-to-table datasets. Our code and data can be found at https://github.com/HKUST-KnowComp/LiveSum-TTT.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding
Authors:
Chunkit Chan,
Cheng Jiayang,
Yauwai Yim,
Zheye Deng,
Wei Fan,
Haoran Li,
Xin Liu,
Hongming Zhang,
Weiqi Wang,
Yangqiu Song
Abstract:
Large Language Models (LLMs) have sparked substantial interest and debate concerning their potential emergence of Theory of Mind (ToM) ability. Theory of mind evaluations currently focuses on testing models using machine-generated data or game settings prone to shortcuts and spurious correlations, which lacks evaluation of machine ToM ability in real-world human interaction scenarios. This poses a…
▽ More
Large Language Models (LLMs) have sparked substantial interest and debate concerning their potential emergence of Theory of Mind (ToM) ability. Theory of mind evaluations currently focuses on testing models using machine-generated data or game settings prone to shortcuts and spurious correlations, which lacks evaluation of machine ToM ability in real-world human interaction scenarios. This poses a pressing demand to develop new real-world scenario benchmarks. We introduce NegotiationToM, a new benchmark designed to stress-test machine ToM in real-world negotiation surrounding covered multi-dimensional mental states (i.e., desires, beliefs, and intentions). Our benchmark builds upon the Belief-Desire-Intention (BDI) agent modeling theory and conducts the necessary empirical experiments to evaluate large language models. Our findings demonstrate that NegotiationToM is challenging for state-of-the-art LLMs, as they consistently perform significantly worse than humans, even when employing the chain-of-thought (CoT) method.
△ Less
Submitted 4 July, 2024; v1 submitted 21 April, 2024;
originally announced April 2024.
-
Coverage Control for a Multi-robot Team with Heterogeneous Capabilities using Block Coordinate Descent (BCD) Method
Authors:
Yung Yu Andy Yiu,
Ying Hing Yim,
Yan Ning,
Zikai Wang,
Ling Shi
Abstract:
In this paper, we propose a coverage control system for a multi-robot team with heterogeneous capabilities to patrol or monitor a bounded environment. The capability could be defined as any criterion of robots like remaining power or mobile speed, depending on the purpose. The proposed control system aims to allocate different portions of the environment to the robots according to their capabiliti…
▽ More
In this paper, we propose a coverage control system for a multi-robot team with heterogeneous capabilities to patrol or monitor a bounded environment. The capability could be defined as any criterion of robots like remaining power or mobile speed, depending on the purpose. The proposed control system aims to allocate different portions of the environment to the robots according to their capabilities, i.e., the robot with higher capability takes a larger portion of the environment while the robot with lower capability takes a smaller one. We use the block coordinate descent (BCD) method to optimize the location of portions and the partitioning method alternately. A centralized machine is used to synchronize the robots and the gradient of each robot can be computed in a distributed manner. Simulation results are provided to illustrate the performance of the proposed control system.
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
Convolutional Neural Network Quantization using Generalized Gamma Distribution
Authors:
Doyun Kim,
Han Young Yim,
Sanghyuck Ha,
Changgwun Lee,
Inyup Kang
Abstract:
As edge applications using convolutional neural networks (CNN) models grow, it is becoming necessary to introduce dedicated hardware accelerators in which network parameters and feature-map data are represented with limited precision. In this paper we propose a novel quantization algorithm for energy-efficient deployment of the hardware accelerators. For weights and biases, the optimal bit length…
▽ More
As edge applications using convolutional neural networks (CNN) models grow, it is becoming necessary to introduce dedicated hardware accelerators in which network parameters and feature-map data are represented with limited precision. In this paper we propose a novel quantization algorithm for energy-efficient deployment of the hardware accelerators. For weights and biases, the optimal bit length of the fractional part is determined so that the quantization error is minimized over their distribution. For feature-map data, meanwhile, their sample distribution is well approximated with the generalized gamma distribution (GGD), and accordingly the optimal quantization step size can be obtained through the asymptotical closed form solution of GGD. The proposed quantization algorithm has a higher signal-to-quantization-noise ratio (SQNR) than other quantization schemes previously proposed for CNNs, and even can be more improved by tuning the quantization parameters, resulting in efficient implementation of the hardware accelerators for CNNs in terms of power consumption and memory bandwidth.
△ Less
Submitted 31 October, 2018;
originally announced October 2018.
-
Impact of intrinsic biophysical diversity on the activity of spiking neurons
Authors:
Man Yi Yim,
Ad Aertsen,
Stefan Rotter
Abstract:
We study the effect of intrinsic heterogeneity on the activity of a population of leaky integrate-and-fire neurons. By rescaling the dynamical equation, we derive mathematical relations between multiple neuronal parameters and a fluctuating input noise. To this end, common input to heterogeneous neurons is conceived as an identical noise with neuron-specific mean and variance. As a consequence, th…
▽ More
We study the effect of intrinsic heterogeneity on the activity of a population of leaky integrate-and-fire neurons. By rescaling the dynamical equation, we derive mathematical relations between multiple neuronal parameters and a fluctuating input noise. To this end, common input to heterogeneous neurons is conceived as an identical noise with neuron-specific mean and variance. As a consequence, the neuronal output rates can differ considerably, and their relative spike timing becomes desynchronized. This theory can quantitatively explain some recent experimental findings.
△ Less
Submitted 19 February, 2013; v1 submitted 27 August, 2012;
originally announced August 2012.