Large language models (LLMs) demonstrate strong reasoning abilities across mathematical, strategic, and linguistic tasks, yet little is known about how well they reason in dynamic, real-time, multi-agent scenarios, such as collaborative environments in which agents continuously adapt to each other's behavior, as in cooperative gameplay settings. In this paper, we bridge this gap by combining LLM-driven agents with strategic reasoning and real-time adaptation in cooperative, multi-agent environments grounded in game-theoretic principles such as belief consistency and Nash equilibrium. The proposed framework applies broadly to dynamic scenarios in which agents coordinate, communicate, and make decisions in response to continuously changing conditions. We provide real-time strategy refinement and adaptive feedback mechanisms that enable agents to dynamically adjust policies based on immediate contextual interactions, in contrast to previous efforts that evaluate LLM capabilities in static or turn-based settings. Empirical results show that our method achieves up to a 26\% improvement in return over PPO baselines in high-noise environments, while maintaining real-time latency under 1.05 milliseconds. Our approach improves collaboration efficiency, task completion rates, and flexibility, illustrating that game-theoretic guidance integrated with real-time feedback enhances LLM performance, ultimately fostering more resilient and flexible strategic multi-agent systems.
翻译:大语言模型(LLMs)在数学、策略和语言任务中展现出强大的推理能力,然而,对于其在动态、实时、多智能体场景(例如协作环境中智能体持续适应彼此行为,如合作性游戏场景)中的推理能力,目前知之甚少。本文通过将LLM驱动的智能体与基于博弈论原理(如信念一致性和纳什均衡)的合作性多智能体环境中的策略推理和实时适应相结合,弥补了这一空白。所提出的框架广泛适用于智能体在持续变化条件下进行协调、通信和决策的动态场景。我们提供了实时策略优化和自适应反馈机制,使智能体能够基于即时情境交互动态调整策略,这与先前在静态或回合制环境中评估LLM能力的研究形成对比。实证结果表明,在高噪声环境中,我们的方法相比PPO基线在回报上实现了高达26%的提升,同时将实时延迟保持在1.05毫秒以下。我们的方法提高了协作效率、任务完成率和灵活性,表明与实时反馈相结合的博弈论指导能够增强LLM性能,最终促进更具韧性和灵活性的策略性多智能体系统的发展。