Multi-task learning ideally allows robots to acquire a diverse repertoire of useful skills. However, many multi-task reinforcement learning efforts assume the robot can collect data from all tasks at all times. In reality, the tasks that the robot learns arrive sequentially, depending on the user and the robot's current environment. In this work, we study a practical sequential multi-task RL problem that is motivated by the practical constraints of physical robotic systems, and derive an approach that effectively leverages the data and policies learned for previous tasks to cumulatively grow the robot's skill-set. In a series of simulated robotic manipulation experiments, our approach requires less than half the samples than learning each task from scratch, while avoiding impractical round-robin data collection. On a Franka Emika Panda robot arm, our approach incrementally learns ten challenging tasks, including bottle capping and block insertion.
翻译:多任务学习理想地让机器人获得各种有用的技能。 但是,许多多任务强化学习努力假设机器人可以随时收集所有任务的数据。 事实上,根据用户和机器人当前环境,机器人依次学习的任务依次到来。 在这项工作中,我们研究一个由物理机器人系统的实际限制驱动的实用的多任务RL连续问题,并得出一种有效利用以往任务所学的数据和政策的方法,以积累机器人的技能集。 在一系列模拟机器人操作实验中,我们的方法需要的样本少于从零到零学习每项任务的一半,同时避免不切实际的圆形数据收集。 在法兰卡·埃米卡·潘达机器人臂上,我们的方法逐渐学习了10项挑战性任务,包括瓶盖和块插入。