When robots share the same workspace with other intelligent agents (e.g., other robots or humans), they must be able to reason about the behaviors of their neighboring agents while accomplishing the designated tasks. In practice, frequently, agents do not exhibit absolutely rational behavior due to their limited computational resources. Thus, predicting the optimal agent behaviors is undesirable (because it demands prohibitive computational resources) and undesirable (because the prediction may be wrong). Motivated by this observation, we remove the assumption of perfectly rational agents and propose incorporating the concept of bounded rationality from an information-theoretic view into the game-theoretic framework. This allows the robots to reason other agents' sub-optimal behaviors and act accordingly under their computational constraints. Specifically, bounded rationality directly models the agent's information processing ability, which is represented as the KL-divergence between nominal and optimized stochastic policies, and the solution to the bounded-optimal policy can be obtained by an efficient importance sampling approach. Using both simulated and real-world experiments in multi-robot navigation tasks, we demonstrate that the resulting framework allows the robots to reason about different levels of rational behaviors of other agents and compute a reasonable strategy under its computational constraint.
翻译:当机器人与其他智能物剂(例如其他机器人或人类)共享相同的工作空间时,他们必须能够理解其相邻物剂的行为,同时完成指定的任务。实际上,由于计算资源有限,代理人通常不会表现出绝对理性的行为。因此,预测最佳物剂行为是不可取的(因为它要求令人望而却步的计算资源)和不可取的(因为预测可能是错误的)。受此观察的驱使,我们从完全理性物剂的假设中排除了完全合理的物剂的假设,并提议将约束性合理性的概念纳入游戏理论框架。这让机器人能够解释其他物剂的亚最佳行为,并在其计算限制下相应采取行动。具体地说,约束性的合理性直接模拟了代理人的信息处理能力,这表现为名义和最佳性随机性政策之间的KL-调控律,而约束性最佳物剂政策的解决办法可以通过高效的重要性取样方法获得。在多色物剂航行过程中,利用模拟和现实性世界的实验来解释其他理性的机器人行为,我们通过不同的理性的计算方法来展示它所产生的其他理性水平。