基金会模式反馈中的政策调整</s> (Policy Adaptation from Foundation Model Feedback)

Recent progress on vision-language foundation models have brought significant advancement to building general-purpose robots. By using the pre-trained models to encode the scene and instructions as inputs for decision making, the instruction-conditioned policy can generalize across different objects and tasks. While this is encouraging, the policy still fails in most cases given an unseen task or environment. In this work, we propose Policy Adaptation from Foundation model Feedback (PAFF). When deploying the trained policy to a new task or a new environment, we first let the policy play with randomly generated instructions to record the demonstrations. While the execution could be wrong, we can use the pre-trained foundation models to provide feedback to relabel the demonstrations. This automatically provides new pairs of demonstration-instruction data for policy fine-tuning. We evaluate our method on a broad range of experiments with the focus on generalization on unseen objects, unseen tasks, unseen environments, and sim-to-real transfer. We show PAFF improves baselines by a large margin in all cases. Our project page is available at https://geyuying.github.io/PAFF/

翻译：视觉语言基础模型的近期进展为建设通用机器人带来了重大进步。通过使用经过预先培训的模型对场景和指示进行编码,作为决策投入,指导性政策可以对不同的对象和任务进行概括化。虽然这是令人鼓舞的,但在大多数情况下,该政策仍然失败,因为任务或环境是看不见的。在这项工作中,我们建议从基金会模型反馈(PAFF)中进行政策调整。在将经过培训的政策应用到新的任务或新环境时,我们首先让政策玩弄随机生成的指示来记录演示。虽然执行可能是错误的,但我们可以使用经过预先培训的基础模型来提供反馈,为演示重新标注。这自动提供了用于政策微调的示范性教学数据新配对。我们评估了广泛的实验方法,重点是对看不见的物体、看不见的任务、看不见的环境和模拟到真实的传输进行概括化。我们让PAFF在各种情况下都以大幅度改进基线。我们的项目网页可在https://geyuying.github.io/PAFF/ 上查阅。</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日