如何花费机器人时间:为基于愿景的机器人操纵搭桥启动和离线强化学习 (How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation) - 专知论文

会员服务 ·

0

INTERACT · Kickstarter · 学成 · 强化学习 · 机器人 ·

2022 年 5 月 6 日

How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation

翻译：如何花费机器人时间:为基于愿景的机器人操纵搭桥启动和离线强化学习

Alex X. Lee,Coline Devin,Jost Tobias Springenberg,Yuxiang Zhou,Thomas Lampe,Abbas Abdolmaleki,Konstantinos Bousmalis

Reinforcement learning (RL) has been shown to be effective at learning control from experience. However, RL typically requires a large amount of online interaction with the environment. This limits its applicability to real-world settings, such as in robotics, where such interaction is expensive. In this work we investigate ways to minimize online interactions in a target task, by reusing a suboptimal policy we might have access to, for example from training on related prior tasks, or in simulation. To this end, we develop two RL algorithms that can speed up training by using not only the action distributions of teacher policies, but also data collected by such policies on the task at hand. We conduct a thorough experimental study of how to use suboptimal teachers on a challenging robotic manipulation benchmark on vision-based stacking with diverse objects. We compare our methods to offline, online, offline-to-online, and kickstarting RL algorithms. By doing so, we find that training on data from both the teacher and student, enables the best performance for limited data budgets. We examine how to best allocate a limited data budget -- on the target task -- between the teacher and the student policy, and report experiments using varying budgets, two teachers with different degrees of suboptimality, and five stacking tasks that require a diverse set of behaviors. Our analysis, both in simulation and in the real world, shows that our approach is the best across data budgets, while standard offline RL from teacher rollouts is surprisingly effective when enough data is given.

翻译：强化学习( RL) 已被证明在从经验中学习控制方面是有效的。然而, RL 通常需要大量与环境进行在线互动。这限制了它适用于真实世界环境, 例如机器人, 机器人的这种互动非常昂贵。在这项工作中, 我们调查如何在目标任务中最大限度地减少在线互动, 重新使用亚最佳政策, 例如从相关先前任务的培训或模拟中进行。为此, 我们开发了两种 RL 算法, 不仅使用教师政策的行动分布, 也使用这类政策所收集的手头任务的数据来加快培训速度。我们进行一项彻底的实验研究, 研究如何使用亚优的机器人教师, 在一个具有挑战性的机器人操作基准上, 将不同对象堆叠在一起。我们比较了我们的方法, 例如从在线、离线到在线, 启动RL 算法。我们发现, 师生关于数据的培训, 使得有限的数据预算能够实现最佳绩效。我们研究如何最佳地在目标任务中分配两个有限的滚动数据预算 -- 由不同层次的教师和学生分析。

0

相关内容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

拓扑绝缘体与超导体耦合体系中交叉Andreev反射研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于双时间尺度优化的多机器人策略自适应与一致性

国家自然科学基金

2+阅读 · 2014年12月31日

eCB介导的DSI效应在麻醉-觉醒调节中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

流形上的Bakry-Emery曲率，泛函不等式和热核分析

国家自然科学基金

0+阅读 · 2012年12月31日

lincRNA-ETS1-2上调癌基因ETS-1表达促进雄激素非依赖性前列腺癌演进的机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于图像与过程数据融合的回转窑产品质量参数预报建模

国家自然科学基金

0+阅读 · 2012年12月31日

约束Markov过程的大偏差与拟遍历性及相关问题

国家自然科学基金

0+阅读 · 2012年12月31日

Ｓlingshot-1L/LIM Kinase1信号网络逆转骨肉瘤转移及多药耐药的机制

国家自然科学基金

0+阅读 · 2011年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

Sink游牧型传感器网络中基于机会协作的数据获取方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning

Arxiv

0+阅读 · 2022年6月27日

Efficient and Robust Training of Dense Object Nets for Multi-Object Robot Manipulation

Arxiv

0+阅读 · 2022年6月24日

CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-horizon Robot Manipulation Tasks

CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-horizon Robot Manipulation Tasks

Arxiv

0+阅读 · 2022年6月23日

Variational Meta Reinforcement Learning for Social Robotics

Arxiv

0+阅读 · 2022年6月23日

PAC: Assisted Value Factorisation with Counterfactual Predictions in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年6月22日

Domain Generalization in Vision: A Survey

Arxiv

16+阅读 · 2021年7月18日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

Meta-Transfer Learning for Zero-Shot Super-Resolution

Meta-Transfer Learning for Zero-Shot Super-Resolution

Arxiv

43+阅读 · 2020年2月27日

Learning Heuristics over Large Graphs via Deep Reinforcement Learning

Arxiv

12+阅读 · 2019年3月8日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

VIP会员

文章信息

相关主题

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《巡飞弹药（爆炸性无人机）威胁态势分析》最新24页报告

《军用后勤无人机：破解战场运输挑战的创新方案》

人工智能战争：以色列、伊朗与新型AI战争形态

《俄乌战争：现代战争未来的启示与经验》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning

Arxiv

0+阅读 · 2022年6月27日

Efficient and Robust Training of Dense Object Nets for Multi-Object Robot Manipulation

Arxiv

0+阅读 · 2022年6月24日

CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-horizon Robot Manipulation Tasks

CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-horizon Robot Manipulation Tasks

Arxiv

0+阅读 · 2022年6月23日

Variational Meta Reinforcement Learning for Social Robotics

Arxiv

0+阅读 · 2022年6月23日

PAC: Assisted Value Factorisation with Counterfactual Predictions in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年6月22日

Domain Generalization in Vision: A Survey

Arxiv

16+阅读 · 2021年7月18日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

Meta-Transfer Learning for Zero-Shot Super-Resolution

Meta-Transfer Learning for Zero-Shot Super-Resolution

Arxiv

43+阅读 · 2020年2月27日

Learning Heuristics over Large Graphs via Deep Reinforcement Learning

Arxiv

12+阅读 · 2019年3月8日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

相关基金

拓扑绝缘体与超导体耦合体系中交叉Andreev反射研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于双时间尺度优化的多机器人策略自适应与一致性

国家自然科学基金

2+阅读 · 2014年12月31日

eCB介导的DSI效应在麻醉-觉醒调节中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

流形上的Bakry-Emery曲率，泛函不等式和热核分析

国家自然科学基金

0+阅读 · 2012年12月31日

lincRNA-ETS1-2上调癌基因ETS-1表达促进雄激素非依赖性前列腺癌演进的机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于图像与过程数据融合的回转窑产品质量参数预报建模

国家自然科学基金

0+阅读 · 2012年12月31日

约束Markov过程的大偏差与拟遍历性及相关问题

国家自然科学基金

0+阅读 · 2012年12月31日

Ｓlingshot-1L/LIM Kinase1信号网络逆转骨肉瘤转移及多药耐药的机制

国家自然科学基金

0+阅读 · 2011年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

Sink游牧型传感器网络中基于机会协作的数据获取方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员