基于多样性的轨迹和目标选择,并重现前见经验 (Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay) - 专知论文

会员服务 ·

0

经验回放 · HER · 均匀采样 · Performer · 样本 ·

2021 年 8 月 17 日

Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay

翻译：基于多样性的轨迹和目标选择,并重现前见经验

Tianhong Dai,Hengyan Liu,Kai Arulkumaran,Guangyu Ren,Anil Anthony Bharath

Hindsight experience replay (HER) is a goal relabelling technique typically used with off-policy deep reinforcement learning algorithms to solve goal-oriented tasks; it is well suited to robotic manipulation tasks that deliver only sparse rewards. In HER, both trajectories and transitions are sampled uniformly for training. However, not all of the agent's experiences contribute equally to training, and so naive uniform sampling may lead to inefficient learning. In this paper, we propose diversity-based trajectory and goal selection with HER (DTGSH). Firstly, trajectories are sampled according to the diversity of the goal states as modelled by determinantal point processes (DPPs). Secondly, transitions with diverse goal states are selected from the trajectories by using k-DPPs. We evaluate DTGSH on five challenging robotic manipulation tasks in simulated robot environments, where we show that our method can learn more quickly and reach higher performance than other state-of-the-art approaches on all tasks.

翻译：事后经验重现(HER)是一种目标重新标签技术,通常与政策外强化学习算法一起用于解决面向目标的任务;它非常适合机器人操纵任务,只提供微薄的回报。在她身上,轨迹和过渡均进行统一的抽样培训。然而,并非所有代理人的经验都同样有助于培训,因此天真的统一抽样都可能导致低效学习。在本文中,我们建议与HER(DGSH)一道,以多样性为基础的轨迹和目标选择。首先,轨迹根据目标国的多样性进行抽样,以定点进程(DPPs)为模型。第二,使用 k-DPPs从轨迹选择不同目标国的过渡。我们评估DGSH在模拟机器人环境中的五项具有挑战性的机器人操纵任务,我们在那里显示,我们的方法可以在所有任务上更快地学习,并达到比其他最先进的方法更高的性能。

0

相关内容

经验回放

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

专知会员服务

77+阅读 · 2021年1月23日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【斯坦福大学ICLR2020】无任务的持续元学习，Continue Meta-learning without tasks

【斯坦福大学ICLR2020】无任务的持续元学习，Continue Meta-learning without tasks

专知会员服务

16+阅读 · 2019年12月18日

Uber AI NeurIPS 2019《元学习meta-learning》教程，附92页PPT下载

Uber AI NeurIPS 2019《元学习meta-learning》教程，附92页PPT下载

专知会员服务

113+阅读 · 2019年12月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Provably Efficient Multi-Agent Reinforcement Learning with Fully Decentralized Communication

Provably Efficient Multi-Agent Reinforcement Learning with Fully Decentralized Communication

Arxiv

0+阅读 · 2021年10月14日

View Vertically: A Hierarchical Network for Trajectory Prediction via Fourier Spectrums

Arxiv

0+阅读 · 2021年10月14日

Safe Driving via Expert Guided Policy Optimization

Arxiv

0+阅读 · 2021年10月13日

THOMAS: Trajectory Heatmap Output with learned Multi-Agent Sampling

Arxiv

0+阅读 · 2021年10月13日

Decentralized Cooperative Multi-Agent Reinforcement Learning with Exploration

Arxiv

0+阅读 · 2021年10月12日

Inverse Constrained Reinforcement Learning

Arxiv

8+阅读 · 2021年5月21日

Energy-Based Hindsight Experience Prioritization

Arxiv

3+阅读 · 2018年10月8日

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

Arxiv

8+阅读 · 2018年7月10日

Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

Arxiv

6+阅读 · 2018年6月7日

Multi-Task Learning with Labeled and Unlabeled Tasks

Arxiv

3+阅读 · 2017年6月8日

VIP会员

文章信息

相关主题

相关VIP内容

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

专知会员服务

77+阅读 · 2021年1月23日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【斯坦福大学ICLR2020】无任务的持续元学习，Continue Meta-learning without tasks

【斯坦福大学ICLR2020】无任务的持续元学习，Continue Meta-learning without tasks

专知会员服务

16+阅读 · 2019年12月18日

Uber AI NeurIPS 2019《元学习meta-learning》教程，附92页PPT下载

Uber AI NeurIPS 2019《元学习meta-learning》教程，附92页PPT下载

专知会员服务

113+阅读 · 2019年12月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

大型语言模型遇上文本属性图：一种融合框架与应用的综述

人工智能赋能自主武器与人类控制第三部分：人类控制与系统操作员 | 35页

【博士论文】用于概率程序与生成模型的变分推断

军事指挥控制系统：2025年5种用途

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Provably Efficient Multi-Agent Reinforcement Learning with Fully Decentralized Communication

Provably Efficient Multi-Agent Reinforcement Learning with Fully Decentralized Communication

Arxiv

0+阅读 · 2021年10月14日

View Vertically: A Hierarchical Network for Trajectory Prediction via Fourier Spectrums

Arxiv

0+阅读 · 2021年10月14日

Safe Driving via Expert Guided Policy Optimization

Arxiv

0+阅读 · 2021年10月13日

THOMAS: Trajectory Heatmap Output with learned Multi-Agent Sampling

Arxiv

0+阅读 · 2021年10月13日

Decentralized Cooperative Multi-Agent Reinforcement Learning with Exploration

Arxiv

0+阅读 · 2021年10月12日

Inverse Constrained Reinforcement Learning

Arxiv

8+阅读 · 2021年5月21日

Energy-Based Hindsight Experience Prioritization

Arxiv

3+阅读 · 2018年10月8日

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

Arxiv

8+阅读 · 2018年7月10日

Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

Arxiv

6+阅读 · 2018年6月7日

Multi-Task Learning with Labeled and Unlabeled Tasks

Arxiv

3+阅读 · 2017年6月8日

微信扫码咨询专知VIP会员