Information gathering while interacting with other agents under sensing and motion uncertainty is critical in domains such as driving, service robots, racing, or surveillance. The interests of agents may be at odds with others, resulting in a stochastic non-cooperative dynamic game. Agents must predict others' future actions without communication, incorporate their actions into these predictions, account for uncertainty and noise in information gathering, and consider what information their actions reveal. Our solution uses local iterative dynamic programming in Gaussian belief space to solve a game-theoretic continuous POMDP. Solving a quadratic game in the backward pass of a game-theoretic belief-space variant of iLQG achieves a runtime polynomial in the number of agents and linear in the planning horizon. Our algorithm yields linear feedback policies for our robot, and predicted feedback policies for other agents. We present three applications: active surveillance, guiding eyes for a blind agent, and autonomous racing. Agents with game-theoretic belief-space planning win 44% more races than without game theory and 34% more than without belief-space planning.
翻译:在驾驶、服务机器人、赛跑或监视等领域,在与感知和运动不确定的其他物剂进行互动时收集信息至关重要。 代理人的利益可能与他人不相符合, 从而形成一种随机不合作的动态游戏。 代理人必须预测他人未来的行动而不进行交流, 将他们的行动纳入这些预测, 在信息收集中考虑到不确定性和噪音, 并考虑他们的行动所披露的信息。 我们的解决方案在高西亚信仰空间使用本地迭代动态程序来解决游戏理论连续POMDP。 在iLQG游戏理论信仰空间变异的后端解决一个二次游戏, 在规划范围内的代理人数量和线性方面达到一个运行时的多时段。 我们的算法为我们的机器人提供线性反馈政策, 并预测其他代理人的反馈政策。 我们提出三种应用: 积极的监视, 引导盲人代理人的眼睛, 和自主赛车。 游戏理论信仰空间规划的代理人赢得的比赛比没有游戏理论的比赛多44%, 比没有信仰空间规划的要多34%。