Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task (p<.05) and personalization (p<.05) performance.
翻译:从演示(LfD)中学习的方法使终端用户能够通过展示理想行为、使机器人进入民主化,教授机器人新任务。然而,目前的LfD框架无法快速适应各种人类演示或大规模部署无处不在的机器人应用,无法快速适应各种人类演示或大规模部署。在本文中,我们提议了一个全新的LfD框架,即快速终身适应反强化学习(FLAIR) 。我们的方法(1) 利用学习过的战略来构建政策混合物,以便快速适应新的演示,允许用户快速个人化,(2) 在整个演示中积累共同知识,实现准确的任务推导;(3) 只有在终身部署中需要时,才扩展其模型,以适应各种不同的人类演示或大规模部署。我们从经验上证实,FLIR实现了适应性(即机器人适应差异性、用户特有的任务偏好)、效率(即机器人实现样本效率适应)和可扩展性(例如,模型在演示中不断增长的次线与显示 < 任务数量,同时保持高性能回报。A最后的FAAA 成功性调整,作为我们平均任务中的平均比例的演示任务。