重现虚拟目标优先排序 (Bias-Reduced Hindsight Experience Replay with Virtual Goal Prioritization) - 专知论文

会员服务 ·

0

经验回放 · HER · 可约的 · 回合 · 稀疏 ·

2021 年 3 月 7 日

Bias-Reduced Hindsight Experience Replay with Virtual Goal Prioritization

翻译：重现虚拟目标优先排序

Binyamin Manela,Armin Biess

Hindsight Experience Replay (HER) is a multi-goal reinforcement learning algorithm for sparse reward functions. The algorithm treats every failure as a success for an alternative (virtual) goal that has been achieved in the episode. Virtual goals are randomly selected, irrespective of which are most instructive for the agent. In this paper, we present two improvements over the existing HER algorithm. First, we prioritize virtual goals from which the agent will learn more valuable information. We call this property the instructiveness of the virtual goal and define it by a heuristic measure, which expresses how well the agent will be able to generalize from that virtual goal to actual goals. Secondly, we reduce existing bias in HER by the removal of misleading samples. To test our algorithms, we built two challenging environments with sparse reward functions. Our empirical results in both environments show vast improvement in the final success rate and sample efficiency when compared to the original HER algorithm. A video showing experimental results is available at https://youtu.be/3cZwfK8Nfps .

翻译：事后观察重现(HER) 是一种用于稀有报酬功能的多目标强化学习算法。算法将每个失败都视为成功, 以达到在这一集中已经实现的替代( 虚拟) 目标。虚拟目标是随机选择的, 不论对代理最有启发性。在本文中, 我们展示了两个比现有的 HER 算法更好的改进。首先, 我们将虚拟目标优先排序, 代理商从中学习更有价值的信息。我们把这个属性称为虚拟目标的启发性, 并通过超常度测量来定义它, 这表示代理商能够从虚拟目标向实际目标概括到实际目标。其次, 我们通过移除误导样本来减少她现有的偏差。为了测试我们的算法, 我们建立了两个充满挑战性的环境, 且奖励功能稀少。我们两个环境中的经验结果表明, 最终成功率和样本效率与最初的HER 算法相比都有很大改进。显示实验结果的视频可在 https://youtu.be/3cZwK8Nfps 上查阅。

0

相关内容

经验回放

直白生动！《机器学习知识点彩图版》297页ppt以图画式描述机器学习中的知识点

直白生动！《机器学习知识点彩图版》297页ppt以图画式描述机器学习中的知识点

专知会员服务

81+阅读 · 2021年3月11日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【CMU】机器学习导论课程（Introduction to Machine Learning）

【CMU】机器学习导论课程（Introduction to Machine Learning）

专知会员服务

61+阅读 · 2019年8月26日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

谷歌足球游戏环境使用介绍

谷歌足球游戏环境使用介绍

CreateAMind

33+阅读 · 2019年6月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Using Meta Reinforcement Learning to Bridge the Gap between Simulation and Experiment in Energy Demand Response

Arxiv

0+阅读 · 2021年4月29日

What is Going on Inside Recurrent Meta Reinforcement Learning Agents?

Arxiv

0+阅读 · 2021年4月29日

PPFL: Privacy-preserving Federated Learning with Trusted Execution Environments

PPFL: Privacy-preserving Federated Learning with Trusted Execution Environments

Arxiv

0+阅读 · 2021年4月29日

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

Arxiv

0+阅读 · 2021年4月29日

Pushing it out of the Way: Interactive Visual Navigation

Arxiv

0+阅读 · 2021年4月28日

Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations

Arxiv

0+阅读 · 2021年4月28日

Energy-Based Hindsight Experience Prioritization

Arxiv

3+阅读 · 2018年10月8日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

Virtual-to-Real: Learning to Control in Visual Semantic Segmentation

Arxiv

4+阅读 · 2018年4月29日

IQA: Visual Question Answering in Interactive Environments

Arxiv

5+阅读 · 2018年4月5日

VIP会员

文章信息

相关主题

相关VIP内容

直白生动！《机器学习知识点彩图版》297页ppt以图画式描述机器学习中的知识点

直白生动！《机器学习知识点彩图版》297页ppt以图画式描述机器学习中的知识点

专知会员服务

81+阅读 · 2021年3月11日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【CMU】机器学习导论课程（Introduction to Machine Learning）

【CMU】机器学习导论课程（Introduction to Machine Learning）

专知会员服务

61+阅读 · 2019年8月26日

热门VIP内容

开通专知VIP会员享更多权益服务

《战场能源实战化最佳实践：大规模作战中的发电、储能与配电体系》美陆军最新报告

《大西洋决心行动及涉乌克兰美国政府活动报告》最新120页

战术边缘计算：加速军事情报周期革命

《现代环境不确定性下的多域作战：小国防御体系构建》

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

谷歌足球游戏环境使用介绍

谷歌足球游戏环境使用介绍

CreateAMind

33+阅读 · 2019年6月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Using Meta Reinforcement Learning to Bridge the Gap between Simulation and Experiment in Energy Demand Response

Arxiv

0+阅读 · 2021年4月29日

What is Going on Inside Recurrent Meta Reinforcement Learning Agents?

Arxiv

0+阅读 · 2021年4月29日

PPFL: Privacy-preserving Federated Learning with Trusted Execution Environments

PPFL: Privacy-preserving Federated Learning with Trusted Execution Environments

Arxiv

0+阅读 · 2021年4月29日

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

Arxiv

0+阅读 · 2021年4月29日

Pushing it out of the Way: Interactive Visual Navigation

Arxiv

0+阅读 · 2021年4月28日

Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations

Arxiv

0+阅读 · 2021年4月28日

Energy-Based Hindsight Experience Prioritization

Arxiv

3+阅读 · 2018年10月8日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

Virtual-to-Real: Learning to Control in Visual Semantic Segmentation

Arxiv

4+阅读 · 2018年4月29日

IQA: Visual Question Answering in Interactive Environments

Arxiv

5+阅读 · 2018年4月5日

微信扫码咨询专知VIP会员