Methods for learning from demonstration (LfD) have shown success in acquiring behavior policies by imitating a user. However, even for a single task, LfD may require numerous demonstrations. For versatile agents that must learn many tasks via demonstration, this process would substantially burden the user if each task were learned in isolation. To address this challenge, we introduce the novel problem of lifelong learning from demonstration, which allows the agent to continually build upon knowledge learned from previously demonstrated tasks to accelerate the learning of new tasks, reducing the amount of demonstrations required. As one solution to this problem, we propose the first lifelong learning approach to inverse reinforcement learning, which learns consecutive tasks via demonstration, continually transferring knowledge between tasks to improve performance.
翻译:从示范中学习的方法(LfD)表明,通过模仿用户,在获得行为政策方面取得了成功。但是,即使一项任务,LfD也可能需要多次演示。对于必须通过演示学习许多任务的多才手来说,如果每项任务都是孤立地学习,这一过程将给用户带来很大负担。为了应对这一挑战,我们引入了从演示中学习终身经验的新问题,使代理能够不断利用从以往展示的任务中学到的知识,加速学习新任务,减少所需的演示数量。作为解决这一问题的一个解决办法,我们建议采取第一种终生学习方法,以反向强化学习,通过演示学习连续的工作,不断在任务之间传授知识以改善业绩。