Currently, in the study of multiagent systems, the intentions of agents are usually ignored. Nonetheless, as pointed out by Theory of Mind (ToM), people regularly reason about other's mental states, including beliefs, goals, and intentions, to obtain performance advantage in competition, cooperation or coalition. However, due to its intrinsic recursion and intractable modeling of distribution over belief, integrating ToM in multiagent planning and decision making is still a challenge. In this paper, we incorporate ToM in multiagent partially observable Markov decision process (POMDP) and propose an adaptive training algorithm to develop effective collaboration between agents with ToM. We evaluate our algorithms with two games, where our algorithm surpasses all previous decentralized execution algorithms without modeling ToM.
翻译:目前,在多试剂系统的研究中,代理人的意图通常被忽视。然而,正如《思想论》所指出的,人们经常会想到他人的精神状态,包括信仰、目标和意图,以便在竞争、合作或联合中获得业绩优势。然而,由于多试剂系统的内在反复和难以解决的对信仰分配的模型化,将托姆纳入多试剂规划和决策仍然是一个挑战。在本文中,我们将托姆纳入多试剂部分可见的马尔科夫决策程序(POMDP),并提出适应性培训算法,以发展代理人与托姆之间的有效协作。我们用两种游戏来评估我们的算法,我们的算法超过以往所有不模拟托姆的分散执行算法。