后视国家:高效强化学习的混合模重和实实在在的任务要素</s> (Hindsight States: Blending Sim and Real Task Elements for Efficient Reinforcement Learning) - 专知论文

会员服务 ·

0

Learning · 强化学习 · SOFT · 经验回放 · MoDELS ·

2023 年 3 月 9 日

Hindsight States: Blending Sim and Real Task Elements for Efficient Reinforcement Learning

翻译：后视国家:高效强化学习的混合模重和实实在在的任务要素

Simon Guist,Jan Schneider,Alexander Dittrich,Vincent Berenz,Bernhard Schölkopf,Dieter Büchler

Reinforcement learning has shown great potential in solving complex tasks when large amounts of data can be generated with little effort. In robotics, one approach to generate training data builds on simulations based on dynamics models derived from first principles. However, for tasks that, for instance, involve complex soft robots, devising such models is substantially more challenging. Being able to train effectively in increasingly complicated scenarios with reinforcement learning enables to take advantage of complex systems such as soft robots. Here, we leverage the imbalance in complexity of the dynamics to learn more sample-efficiently. We (i) abstract the task into distinct components, (ii) off-load the simple dynamics parts into the simulation, and (iii) multiply these virtual parts to generate more data in hindsight. Our new method, Hindsight States (HiS), uses this data and selects the most useful transitions for training. It can be used with an arbitrary off-policy algorithm. We validate our method on several challenging simulated tasks and demonstrate that it improves learning both alone and when combined with an existing hindsight algorithm, Hindsight Experience Replay (HER). Finally, we evaluate HiS on a physical system and show that it boosts performance on a complex table tennis task with a muscular robot. Videos and code of the experiments can be found on webdav.tuebingen.mpg.de/his/.

翻译：强化学习在解决复杂任务方面显示出巨大的潜力:当大量数据能够以很少的努力生成时,大量数据就能产生大量的数据。在机器人中,一种生成培训数据的方法建立在基于根据最初原则产生的动态模型的模拟基础上。然而,对于例如涉及复杂的软机器人的任务,设计这种模型则更具挑战性。能够有效地在日益复杂的情景下进行训练,而强化学习能够利用软机器人等复杂系统。在这里,我们利用动态的不均衡性能来更高效地学习。我们(一) 将这一任务抽象地纳入不同的部件,(二) 将简单的动态部分卸载到模拟中,(三) 将这些虚拟部分乘以在后视中生成更多的数据。我们的新方法,即Hindsight States(HIS),使用这些数据并选择最有用的培训过渡方法。它可以用任意的离政策算法来使用。我们用一些具有挑战性的模拟任务来验证我们的方法,并证明它与现有的后视算法结合, Hindsight Replay (HER) 和(CHER) 增加这些虚拟部分的虚拟部分。最后,我们在物理系统上评估HISS, 并显示一个复合的网格/breabal tabal tablex tad) 。</s>

0

相关内容

Learning

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

专知会员服务

40+阅读 · 2022年10月10日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

TIM-1-Fc介导辅助T淋巴细胞反应调控异位小肠移植免疫应答机制的研究

国家自然科学基金

0+阅读 · 2016年12月31日

Klotho抑制TRPC6诱导的足细胞损伤在糖尿病肾病中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

求解非线性方程的加速迭代算法

国家自然科学基金

0+阅读 · 2014年12月31日

bFGF经FGFR1-c-SRC-Cav1信号级联促进创伤性脑损伤后血脑屏障恢复的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

OpenFlow高效转发方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

RiPS细胞用于脊髓损伤髓鞘修复的研究

国家自然科学基金

0+阅读 · 2012年12月31日

一株含双降解质粒的红球菌（Rhodococcus sp.）二噁英降解机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型中红外激光晶体Er3＋:CaReAlO4(Re=Y,Gd)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition

Arxiv

0+阅读 · 2023年5月1日

Semi-Infinitely Constrained Markov Decision Processes and Efficient Reinforcement Learning

Arxiv

0+阅读 · 2023年4月29日

Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年4月28日

Multi-Source Transfer Learning for Deep Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2023年4月27日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

VIP会员

文章信息

相关主题

相关VIP内容

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

专知会员服务

40+阅读 · 2022年10月10日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition

Arxiv

0+阅读 · 2023年5月1日

Semi-Infinitely Constrained Markov Decision Processes and Efficient Reinforcement Learning

Arxiv

0+阅读 · 2023年4月29日

Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年4月28日

Multi-Source Transfer Learning for Deep Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2023年4月27日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

TIM-1-Fc介导辅助T淋巴细胞反应调控异位小肠移植免疫应答机制的研究

国家自然科学基金

0+阅读 · 2016年12月31日

Klotho抑制TRPC6诱导的足细胞损伤在糖尿病肾病中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

求解非线性方程的加速迭代算法

国家自然科学基金

0+阅读 · 2014年12月31日

bFGF经FGFR1-c-SRC-Cav1信号级联促进创伤性脑损伤后血脑屏障恢复的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

OpenFlow高效转发方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

RiPS细胞用于脊髓损伤髓鞘修复的研究

国家自然科学基金

0+阅读 · 2012年12月31日

一株含双降解质粒的红球菌（Rhodococcus sp.）二噁英降解机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型中红外激光晶体Er3＋:CaReAlO4(Re=Y,Gd)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员