While Large Language Models (LLMs) based agents have successfully mimicked human behaviors in various scenarios, the realm of complex, multi-character social interactions within extended contexts remains underexplored. The challenge is compounded by privacy concerns, making it difficult to capture and utilize intricate real-life interactions. More importantly, the absence of quantitative evaluation methods hampers the pursuit of high-quality agent interactions, often leading to interactions that are limited in informativeness and expressiveness, characterized by superficial small talk without clear intentions. In this work, we leverage the rules of Tabletop Role-Playing Games (TRPG) to create an environment conducive to complex, context-rich interactions, emphasizing informativeness and expressiveness. This virtual setting alleviates privacy concerns and motivates agents to engage in meaningful, high-quality interactions as part of their in-game objectives. To assess these interactions, we introduce the Agent interaction Evaluation framework (AntEval), targeting the qualitative evaluation of interaction informativeness and expressiveness. Specifically, we propose two novel evaluation metrics: Information Exchanging Precision (IEP) and Interaction Expressiveness Gap (IEG). These metrics are designed to assess interactions in scenarios focused on information exchange and intention expression, respectively. Our experimental results demonstrate the effectiveness of these metrics in evaluating interaction quality. Notably, we identify significant areas for improvement in LLMs regarding social interactions, as highlighted by our metrics. We believe AntEval will guide further exploration in complex agent interactions, bringing them closer to emulating real human behavior and enhancing their integration and utility in real-world applications.
翻译:暂无翻译