Guided cooperation is a common task in many multi-agent teaming applications. The planning of the cooperation is difficult when the leader robot has incomplete information about the follower, and there is a need to learn, customize, and adapt the cooperation plan online. To this end, we develop a learning-based Stackelberg game-theoretic framework to address this challenge to achieve optimal trajectory planning for heterogeneous robots. We first formulate the guided trajectory planning problem as a dynamic Stackelberg game and design the cooperation plans using open-loop Stackelberg equilibria. We leverage meta-learning to deal with the unknown follower in the game and propose a Stackelberg meta-learning framework to create online adaptive trajectory guidance plans, where the leader robot learns a meta-best-response model from a prescribed set of followers offline and then fast adapts to a specific online trajectory guidance task using limited learning data. We use simulations in three different scenarios to elaborate on the effectiveness of our framework. Comparison with other learning approaches and no guidance cases show that our framework provides a more time- and data-efficient planning method in trajectory guidance tasks.
翻译:在许多多试剂团队应用中,指导合作是一项共同任务。当领导机器人对追随者的信息不完全时,合作规划就很困难,需要在线学习、定制和调整合作计划。为此,我们开发了一个基于学习的Stackelberg游戏理论框架,以应对这一挑战,为多种机器人实现最佳轨迹规划。我们首先将引导轨迹规划问题作为一个动态的Stackelberg游戏,并使用开放式的Stackelberg equilibria来设计合作计划。我们利用元学习与游戏中未知的追随者打交道,并提议一个Stackelberg元学习框架,以创建在线适应轨迹指导计划。在此过程中,领导机器人从一组指定的离线追随者中学习了一种超级最佳反应模型,然后利用有限的学习数据快速适应具体的在线轨迹指导任务。我们用三种不同情景的模拟来阐述我们框架的有效性。与其他学习方法进行比较,没有指导案例显示我们的框架提供了更具有时间和数据效率的规划方法。