Recent progress on vision-language foundation models have brought significant advancement to building general-purpose robots. By using the pre-trained models to encode the scene and instructions as inputs for decision making, the instruction-conditioned policy can generalize across different objects and tasks. While this is encouraging, the policy still fails in most cases given an unseen task or environment. In this work, we propose Policy Adaptation from Foundation model Feedback (PAFF). When deploying the trained policy to a new task or a new environment, we first let the policy play with randomly generated instructions to record the demonstrations. While the execution could be wrong, we can use the pre-trained foundation models to provide feedback to relabel the demonstrations. This automatically provides new pairs of demonstration-instruction data for policy fine-tuning. We evaluate our method on a broad range of experiments with the focus on generalization on unseen objects, unseen tasks, unseen environments, and sim-to-real transfer. We show PAFF improves baselines by a large margin in all cases. Our project page is available at https://geyuying.github.io/PAFF/
翻译:视觉语言基础模型的近期进展为建设通用机器人带来了重大进步。 通过使用经过预先培训的模型对场景和指示进行编码,作为决策投入,指导性政策可以对不同的对象和任务进行概括化。虽然这是令人鼓舞的,但在大多数情况下,该政策仍然失败,因为任务或环境是看不见的。在这项工作中,我们建议从基金会模型反馈(PAFF)中进行政策调整。在将经过培训的政策应用到新的任务或新环境时,我们首先让政策玩弄随机生成的指示来记录演示。虽然执行可能是错误的,但我们可以使用经过预先培训的基础模型来提供反馈,为演示重新标注。这自动提供了用于政策微调的示范性教学数据新配对。我们评估了广泛的实验方法,重点是对看不见的物体、看不见的任务、看不见的环境和模拟到真实的传输进行概括化。我们让PAFF在各种情况下都以大幅度改进基线。我们的项目网页可在https://geyuying.github.io/PAFF/ 上查阅。</s>