走向更普遍化的一次性视觉模拟学习 (Towards More Generalizable One-shot Visual Imitation Learning)

A general-purpose robot should be able to master a wide range of tasks and quickly learn a novel one by leveraging past experiences. One-shot imitation learning (OSIL) approaches this goal by training an agent with (pairs of) expert demonstrations, such that at test time, it can directly execute a new task from just one demonstration. However, so far this framework has been limited to training on many variations of one task, and testing on other unseen but similar variations of the same task. In this work, we push for a higher level of generalization ability by investigating a more ambitious multi-task setup. We introduce a diverse suite of vision-based robot manipulation tasks, consisting of 7 tasks, a total of 61 variations, and a continuum of instances within each variation. For consistency and comparison purposes, we first train and evaluate single-task agents (as done in prior few-shot imitation work). We then study the multi-task setting, where multi-task training is followed by (i) one-shot imitation on variations within the training tasks, (ii) one-shot imitation on new tasks, and (iii) fine-tuning on new tasks. Prior state-of-the-art, while performing well within some single tasks, struggles in these harder multi-task settings. To address these limitations, we propose MOSAIC (Multi-task One-Shot Imitation with self-Attention and Contrastive learning), which integrates a self-attention model architecture and a temporal contrastive module to enable better task disambiguation and more robust representation learning. Our experiments show that MOSAIC outperforms prior state of the art in learning efficiency, final performance, and learns a multi-task policy with promising generalization ability via fine-tuning on novel tasks.

翻译：通用机器人应该能够掌握一系列广泛的任务,并且通过利用过去的经验快速学习新颖的任务。一次性的模拟学习(OSIL)通过对一个具有(可能)专家演示的代理进行( ) 演示来接近这一目标, 这样在测试时, 它可以直接从一个演示中执行一项新的任务。但是, 到目前为止,这个框架仅限于对一个任务的许多变异进行培训, 并测试同一任务中的其他不为人知但相似的变异。在这项工作中, 我们通过调查一个更雄心勃勃的多任务设置来推动更高水平的超时能力。我们引入了一套基于愿景的机器人操作任务, 由7个任务、总共61个变异和每个变异的连续实例组成。为了一致性和比较的目的, 我们首先培训和评估单一任务代理( 之前的几发模仿模拟工作), 然后研究多任务设置的多任务。我们接下来的多任务培训模式是( 一) 模拟对培训任务中的变异的模拟, (二) 一手对一些新任务进行一副的模仿, 以及(三) 在新任务中, 精锐化的自我变的自我定位, 在前任务中, 学习这些任务中进行更精确的自我定位的自我调整, 演示中, 演示中, 演示的自我调整的自我任务。