We explore possible methods for multi-task transfer learning which seek to exploit the shared physical structure of robotics tasks. Specifically, we train policies for a base set of pre-training tasks, then experiment with adapting to new off-distribution tasks, using simple architectural approaches for re-using these policies as black-box priors. These approaches include learning an alignment of either the observation space or action space from a base to a target task to exploit rigid body structure, and methods for learning a time-domain switching policy across base tasks which solves the target task, to exploit temporal coherence. We find that combining low-complexity target policy classes, base policies as black-box priors, and simple optimization algorithms allows us to acquire new tasks outside the base task distribution, using small amounts of offline training data.
翻译:我们探索了多任务转移学习的可能方法,寻求利用机器人任务共同的物理结构。 具体地说,我们为一套基本培训前任务培训政策,然后实验适应新的非分配任务,使用简单的建筑方法将这些政策重新用作黑盒前奏。 这些方法包括学习将观测空间或行动空间从一个基地调整为一个目标任务,以利用僵硬的体形结构,以及学习跨基准任务的时间-持续转换政策的方法,解决目标任务,利用时间一致性。 我们发现,将低复杂目标政策类别、基本政策作为黑盒前奏和简单优化算法相结合,使我们能够利用少量离线培训数据,在基本任务分配之外获得新任务。