We present a novel interactive learning protocol that enables training request-fulfilling agents by verbally describing their activities. Unlike imitation learning (IL), our protocol allows the teaching agent to provide feedback in a language that is most appropriate for them. Compared with reward in reinforcement learning (RL), the description feedback is richer and allows for improved sample complexity. We develop a probabilistic framework and an algorithm that practically implements our protocol. Empirical results in two challenging request-fulfilling problems demonstrate the strengths of our approach: compared with RL baselines, it is more sample-efficient; compared with IL baselines, it achieves competitive success rates without requiring the teaching agent to be able to demonstrate the desired behavior using the learning agent's actions. Apart from empirical evaluation, we also provide theoretical guarantees for our algorithm under certain assumptions about the teacher and the environment.
翻译:我们提出了一个新颖的互动学习协议,通过口头描述他们的活动,使培训要求达到要求的代理方能够通过培训满足他们的活动。与模仿学习(IL)不同,我们的协议允许教学代理方以最适合他们的语言提供反馈。与强化学习(RL)的奖励相比,描述反馈更加丰富,可以改进样本复杂性。我们开发了一个概率框架和一个实际执行我们的协议的算法。在两个富有挑战性的满足要求问题中取得的经验性结果显示了我们方法的优势:与RL基线相比,它更具抽样效率;与IL基线相比,它实现了竞争性的成功率,而没有要求教学代理方能够利用学习代理方的行动展示所期望的行为。除了经验评估外,我们还根据对教师和环境的某些假设,为我们的算法提供理论保障。