Recent progress on vision-language foundation models have brought significant advancement to building general-purpose robots. By using the pre-trained models to encode the scene and instructions as inputs for decision making, the instruction-conditioned policy can generalize across different objects and tasks. While this is encouraging, the policy still fails in most cases given an unseen task or environment. In this work, we propose Policy Adaptation from Foundation model Feedback (PAFF). When deploying the trained policy to a new task or a new environment, we first let the policy play with randomly generated instructions to record the demonstrations. While the execution could be wrong, we can use the pre-trained foundation models to provide feedback to relabel the demonstrations. This automatically provides new pairs of demonstration-instruction data for policy fine-tuning. We evaluate our method on a broad range of experiments with the focus on generalization on unseen objects, unseen tasks, unseen environments, and sim-to-real transfer. We show PAFF improves baselines by a large margin in all cases. Our project page is available at https://geyuying.github.io/PAFF/
翻译:最近,视觉语言基础模型的进展为构建通用机器人带来了显著的进步。通过使用预训练模型将场景和指令编码作为决策制定的输入,指令条件化策略可以在不同的对象和任务之间进行泛化。虽然这很鼓舞人心,但是在面对未见任务或环境时,策略仍然在大多数情况下失败。在这项工作中,我们提出了基于基础模型反馈的策略调整(PAFF)。在将经过训练的策略部署到新任务或新环境时,我们首先让策略与随机生成的指令进行交互,以记录演示。虽然执行可能是错误的,但我们可以使用预训练的基础模型提供反馈来重新标记演示。这自动提供了用于策略微调的新的演示-指令数据对。我们在广泛的实验中评估了我们的方法,重点是在未见对象、未见任务、未见环境和从仿真到实际的转移方面的泛化。我们在所有情况下都展示了PAFF将基线改进了很大程度。我们的项目网页位于https://geyuying.github.io/PAFF/。