学习通用人类先验知识在置换操作中的应用: 基于人类偏好的学习方法 (Learning a Universal Human Prior for Dexterous Manipulation from Human Preference) - 专知论文

会员服务 ·

0

置换 · 先验知识 · 操作 · 机器人 · 知识 ·

2023 年 4 月 10 日

Learning a Universal Human Prior for Dexterous Manipulation from Human Preference

翻译：学习通用人类先验知识在置换操作中的应用: 基于人类偏好的学习方法

Zihan Ding,Yuanpei Chen,Allen Z. Ren,Shixiang Shane Gu,Hao Dong,Chi Jin

Generating human-like behavior on robots is a great challenge especially in dexterous manipulation tasks with robotic hands. Even in simulation with no sample constraints, scripting controllers is intractable due to high degrees of freedom, and manual reward engineering can also be hard and lead to non-realistic motions. Leveraging the recent progress on Reinforcement Learning from Human Feedback (RLHF), we propose a framework to learn a universal human prior using direct human preference feedback over videos, for efficiently tuning the RL policy on 20 dual-hand robot manipulation tasks in simulation, without a single human demonstration. One task-agnostic reward model is trained through iteratively generating diverse polices and collecting human preference over the trajectories; it is then applied for regularizing the behavior of polices in the fine-tuning stage. Our method empirically demonstrates more human-like behaviors on robot hands in diverse tasks including even unseen tasks, indicating its generalization capability.

翻译：在机器人置换操作任务中生成类似于人类的行为是巨大的挑战, 特别是涉及到机器人手的灵巧操作。即使在没有样本约束的模拟中, 由于自由度高, 编写控制器也是不可行的, 手动设计奖励也可能难以实现并导致非现实的动作。借助强化学习从人类反馈中的最近进展 (RLHF), 我们提出一个框架,使用直接的人类偏好反馈视频数据来学习通用的人类先验知识,为20个双手机器人置换操作任务在模拟中高效调整RL策略, 并且不需要进行单个人类演示。通过迭代生成不同的策略和收集人类对轨迹的偏好, 训练了一个任务不可知奖励模型, 然后将其应用于在精细调节阶段规范策略的行为。我们的方法在多种任务中, 包括未见过的任务中, 在机器人手上呈现了更类似于人类的行为, 表明它具有良好的泛化能力。

0

相关内容

【普林斯顿博士论文】基于深度模型的高效强化学习，186页pdf

【普林斯顿博士论文】基于深度模型的高效强化学习，186页pdf

专知会员服务

83+阅读 · 2023年1月30日

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【EMNLP2021】基于神经常识知识和符号逻辑规则的会话多跳推理

专知会员服务

27+阅读 · 2021年9月20日

【AAAI2021】克服图神经网络灾难性遗忘，Overcoming Catastrophic Forgetting in GNN

【AAAI2021】克服图神经网络灾难性遗忘，Overcoming Catastrophic Forgetting in GNN

专知会员服务

18+阅读 · 2020年12月15日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

专知会员服务

49+阅读 · 2020年2月25日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

使用强化学习训练机械臂完成人类任务

使用强化学习训练机械臂完成人类任务

AI研习社

13+阅读 · 2019年3月23日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

16篇论文入门manipulation研究

16篇论文入门manipulation研究

机器人学家

15+阅读 · 2017年6月6日

局部可视环境中基于视觉和触觉感知的灵巧手精细操作的方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

人机交互合作装配中人体行为分析与理解方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于超稀疏结构学习的压缩感知重建研究

国家自然科学基金

5+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

结构保持的高效视频缩放技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

一种含有复杂外形运动物体的高效IB-LBM算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于人类行为仿生的飞行器战术意图反演

国家自然科学基金

7+阅读 · 2013年12月31日

先进脑机接口理论与脑控康复车实现技术研究

国家自然科学基金

5+阅读 · 2013年12月31日

基于光致变形微纳夹持器的三维纳米操作机器人研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于事件的强化学习及其在群机器人优化控制中的应用

国家自然科学基金

3+阅读 · 2012年12月31日

PALR: Personalization Aware LLMs for Recommendation

Arxiv

0+阅读 · 2023年5月26日

StyleHumanCLIP: Text-guided Garment Manipulation for StyleGAN-Human

Arxiv

0+阅读 · 2023年5月26日

Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月26日

Strategic Classification under Unknown Personalized Manipulation

Arxiv

0+阅读 · 2023年5月25日

Understanding the Capabilities of Large Language Models for Automated Planning

Arxiv

1+阅读 · 2023年5月25日

A Human-in-the-Loop Approach for Information Extraction from Privacy Policies under Data Scarcity

Arxiv

0+阅读 · 2023年5月24日

Imitation Learning: Progress, Taxonomies and Opportunities

Arxiv

12+阅读 · 2021年6月23日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

17+阅读 · 2018年5月31日

Low-Shot Learning from Imaginary Data

Arxiv

15+阅读 · 2018年4月3日

Ripple Network: Propagating User Preferences on the Knowledge Graph for Recommender Systems

Arxiv

12+阅读 · 2018年3月9日

VIP会员

文章信息

相关主题

相关VIP内容

【普林斯顿博士论文】基于深度模型的高效强化学习，186页pdf

【普林斯顿博士论文】基于深度模型的高效强化学习，186页pdf

专知会员服务

83+阅读 · 2023年1月30日

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【EMNLP2021】基于神经常识知识和符号逻辑规则的会话多跳推理

专知会员服务

27+阅读 · 2021年9月20日

【AAAI2021】克服图神经网络灾难性遗忘，Overcoming Catastrophic Forgetting in GNN

【AAAI2021】克服图神经网络灾难性遗忘，Overcoming Catastrophic Forgetting in GNN

专知会员服务

18+阅读 · 2020年12月15日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

专知会员服务

49+阅读 · 2020年2月25日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

使用强化学习训练机械臂完成人类任务

使用强化学习训练机械臂完成人类任务

AI研习社

13+阅读 · 2019年3月23日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

16篇论文入门manipulation研究

16篇论文入门manipulation研究

机器人学家

15+阅读 · 2017年6月6日

相关论文

PALR: Personalization Aware LLMs for Recommendation

Arxiv

0+阅读 · 2023年5月26日

StyleHumanCLIP: Text-guided Garment Manipulation for StyleGAN-Human

Arxiv

0+阅读 · 2023年5月26日

Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月26日

Strategic Classification under Unknown Personalized Manipulation

Arxiv

0+阅读 · 2023年5月25日

Understanding the Capabilities of Large Language Models for Automated Planning

Arxiv

1+阅读 · 2023年5月25日

A Human-in-the-Loop Approach for Information Extraction from Privacy Policies under Data Scarcity

Arxiv

0+阅读 · 2023年5月24日

Imitation Learning: Progress, Taxonomies and Opportunities

Arxiv

12+阅读 · 2021年6月23日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

17+阅读 · 2018年5月31日

Low-Shot Learning from Imaginary Data

Arxiv

15+阅读 · 2018年4月3日

Ripple Network: Propagating User Preferences on the Knowledge Graph for Recommender Systems

Arxiv

12+阅读 · 2018年3月9日

相关基金

局部可视环境中基于视觉和触觉感知的灵巧手精细操作的方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

人机交互合作装配中人体行为分析与理解方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于超稀疏结构学习的压缩感知重建研究

国家自然科学基金

5+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

结构保持的高效视频缩放技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

一种含有复杂外形运动物体的高效IB-LBM算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于人类行为仿生的飞行器战术意图反演

国家自然科学基金

7+阅读 · 2013年12月31日

先进脑机接口理论与脑控康复车实现技术研究

国家自然科学基金

5+阅读 · 2013年12月31日

基于光致变形微纳夹持器的三维纳米操作机器人研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于事件的强化学习及其在群机器人优化控制中的应用

国家自然科学基金

3+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员