Learning from demonstration methods usually leverage close to optimal demonstrations to accelerate training. By contrast, when demonstrating a task, human teachers deviate from optimal demonstrations and pedagogically modify their behavior by giving demonstrations that best disambiguate the goal they want to demonstrate. Analogously, human learners excel at pragmatically inferring the intent of the teacher, facilitating communication between the two agents. These mechanisms are critical in the few demonstrations regime, where inferring the goal is more difficult. In this paper, we implement pedagogy and pragmatism mechanisms by leveraging a Bayesian model of Goal Inference from demonstrations (BGI). We highlight the benefits of this model in multi-goal teacher-learner setups with two artificial agents that learn with goal-conditioned Reinforcement Learning. We show that combining BGI-agents (a pedagogical teacher and a pragmatic learner) results in faster learning and reduced goal ambiguity over standard learning from demonstrations, especially in the few demonstrations regime. We provide the code for our experiments (https://github.com/Caselles/NeurIPS22-demonstrations-pedagogy-pragmatism), as well as an illustrative video explaining our approach (https://youtu.be/V4n16IjkNyw).
翻译:从示范方法中学习,通常会利用接近最佳的示范来加速培训。相比之下,在展示一项任务时,人类教师偏离了最佳的示范,在教学上改变了他们的行为,通过提供最能掩盖他们想要展示的目标的示范,在教学过程中,人类学习者以务实的方式推敲教师的意图,便利了两个代理人之间的交流。这些机制在少数的示范制度中至关重要,在其中推论目标更为困难。在本文件中,我们利用巴耶斯式的示范,从示范活动中推断目标(BGI)来实施教学和实用机制。我们强调这一模式在多目标教师-Learner设置中的好处,有两个人工代理人学习以目标为条件的强化学习。我们表明,将BGI-代理人(教学教师和务实的学习者)结合起来,可以更快地学习,减少标准从示威中学习的目标模糊性,特别是在少数示范制度中。我们提供了我们的实验准则(http://github.com/Caselles/NeurIPS-demonstratstragy-pragy-pragmatism),以及解释我们16的录像方法(http://Vy)。