We present Bayesian Team Imitation Learner (BTIL), an imitation learning algorithm to model behavior of teams performing sequential tasks in Markovian domains. In contrast to existing multi-agent imitation learning techniques, BTIL explicitly models and infers the time-varying mental states of team members, thereby enabling learning of decentralized team policies from demonstrations of suboptimal teamwork. Further, to allow for sample- and label-efficient policy learning from small datasets, BTIL employs a Bayesian perspective and is capable of learning from semi-supervised demonstrations. We demonstrate and benchmark the performance of BTIL on synthetic multi-agent tasks as well as a novel dataset of human-agent teamwork. Our experiments show that BTIL can successfully learn team policies from demonstrations despite the influence of team members' (time-varying and potentially misaligned) mental states on their behavior.
翻译:我们介绍贝叶斯团队模拟学习者(BTIL),这是一种模拟学习算法,用以模拟在马尔科维亚地区执行连续任务的团队的行为。与现有的多试剂模拟学习技术相比,BTIL明确模型并推断了团队成员具有时间变化的心理状态,从而能够从次优团队协作的示范中学习分散的团队政策。此外,为了从小型数据集中学习样本和标签效率高的政策,BTIL采用了巴伊西亚视角,能够从半监督的演示中学习。我们展示并衡量BTIL在合成多试剂任务方面的表现以及人类代理团队合作的新数据集。我们的实验表明,尽管团队成员(时间变化和可能错配)精神状态对其行为产生了影响,但BTIL仍然能够成功地从演示中学习团队政策。