Learning from demonstration methods usually leverage close to optimal demonstrations to accelerate training. By contrast, when demonstrating a task, human teachers deviate from optimal demonstrations and pedagogically modify their behavior by giving demonstrations that best disambiguate the goal they want to demonstrate. Analogously, human learners excel at pragmatically inferring the intent of the teacher, facilitating communication between the two agents. These mechanisms are critical in the few demonstrations regime, where inferring the goal is more difficult. In this paper, we implement pedagogy and pragmatism mechanisms by leveraging a Bayesian model of goal inference from demonstrations. We highlight the benefits of this model in multi-goal teacher-learner setups with two artificial agents that learn with goal-conditioned Reinforcement Learning. We show that combining a pedagogical teacher and a pragmatic learner results in faster learning and reduced goal ambiguity over standard learning from demonstrations, especially in the few demonstrations regime.
翻译:从示范方法中学习通常会利用接近最佳的示范来加速培训。相比之下,在展示一项任务时,人类教师偏离了最佳的示范,在教学上改变了他们的行为,给其展示的示范提供了最能掩盖他们想要展示的目标的示范。模拟的人类学习者非常擅长以务实的方式推断教师的意图,便利了两个代理人之间的沟通。这些机制在少数的示范制度中至关重要,在其中推断目标更加困难。在本文件中,我们运用巴耶斯人示范示范从示威中推断目标的模式,实施了教学和实用机制。我们强调这一模式在多目标教师-远程师设置中的好处,由两个以目标条件强化学习的人工代理人组成。我们表明,将教学教师和实用学习者结合起来,可以更快地学习,减少标准从示威中学习的模糊性,特别是在少数示范制度下。