探索:学习国家代表 " 探索范式 " 的国家代表 (Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm) - 专知论文

会员服务 ·

0

Learning · 状态空间 · 表示 · 回合 · Guidance ·

2023 年 1 月 13 日

Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm

翻译：探索:学习国家代表 " 探索范式 " 的国家代表

Marc Höftmann,Jan Robine,Stefan Harmeling

from arxiv, 9 pages, 7 figures, Deep Reinforcement Learning Workshop NeurIPS 2022, Deep RL Workshop 2022 NeurIPS, OpenReview

Very large state spaces with a sparse reward signal are difficult to explore. The lack of a sophisticated guidance results in a poor performance for numerous reinforcement learning algorithms. In these cases, the commonly used random exploration is often not helpful. The literature shows that this kind of environments require enormous efforts to systematically explore large chunks of the state space. Learned state representations can help here to improve the search by providing semantic context and build a structure on top of the raw observations. In this work we introduce a novel time-myopic state representation that clusters temporal close states together while providing a time prediction capability between them. By adapting this model to the Go-Explore paradigm (Ecoffet et al., 2021b), we demonstrate the first learned state representation that reliably estimates novelty instead of using the hand-crafted representation heuristic. Our method shows an improved solution for the detachment problem which still remains an issue at the Go-Explore Exploration Phase. We provide evidence that our proposed method covers the entire state space with respect to all possible time trajectories without causing disadvantageous conflict-overlaps in the cell archive. Analogous to native Go-Explore, our approach is evaluated on the hard exploration environments MontezumaRevenge, Gravitar and Frostbite (Atari) in order to validate its capabilities on difficult tasks. Our experiments show that time-myopic Go-Explore is an effective alternative for the domain-engineered heuristic while also being more general. The source code of the method is available on GitHub.

翻译：缺少精密的指导导致许多强化学习算法的性能不佳。在这些情况下, 常用的随机探索往往没有帮助。文献表明, 这种环境需要巨大的努力来系统探索国家空间的大块块。州代表机构可以在这里帮助改进搜索, 提供语义背景, 在原始观测中建立结构。在此工作中, 我们引入了一个新颖的时间- 气象国家代表机构, 将时间- 接近的国家聚集在一起, 并同时提供时间预测能力。通过将这一模型改造到 Go- Explore 模式( Ecoffet 等人, 2021b), 我们展示了第一个学到的状态代表机构, 可靠地估计了国家空间, 而不是使用手工制作的表达方式。我们的方法展示了更好的分解问题解决方案, 在Go- Explore 探索阶段, 这个问题仍然是一个问题。我们提供证据, 我们提出的方法覆盖了整个州空间, 与所有可能的时间轨迹。在细胞档案中不造成不利的冲突重叠。谷- 地- helverial A- trainal laverial Acal lational laviewal is the a laviewing the cal reviewd the the hust the hust the laviewal

0

相关内容

Learning

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

NES1基因联合188Re内放射治疗前列腺癌的实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于自适应特征学习和表观建模的目标跟踪算法研究

国家自然科学基金

2+阅读 · 2014年12月31日

基于A-Train卫星观测的沙尘暴数字重构技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

一类新颖荧光材料五取代四氢嘧啶的合成、光学特性以及结构-性质关系的研究

国家自然科学基金

0+阅读 · 2012年12月31日

卵巢癌肿瘤内异质性的遗传生物学特征分析及相关分子靶标的筛选

国家自然科学基金

0+阅读 · 2012年12月31日

基于protein pathway array 技术导向的胃癌淋巴结转移预警蛋白表达特征的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

InSAR支持下基于支持向量机的地震滑坡空间预测研究

国家自然科学基金

0+阅读 · 2009年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

ClC-3氯通道蛋白在肿瘤转移中的功能研究

国家自然科学基金

0+阅读 · 2008年12月31日

Beware of Instantaneous Dependence in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Learning the Finer Things: Bayesian Structure Learning at the Instantiation Level

Arxiv

0+阅读 · 2023年3月8日

A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning

Arxiv

0+阅读 · 2023年3月7日

Spectral Decomposition Representation for Reinforcement Learning

Arxiv

0+阅读 · 2023年3月7日

Investigation of chemical structure recognition by encoder-decoder models in learning progress

Arxiv

0+阅读 · 2023年3月7日

Improving Self-Supervised Learning for Audio Representations by Feature Diversity and Decorrelation

Arxiv

0+阅读 · 2023年3月7日

Sampling-based Exploration for Reinforcement Learning of Dexterous Manipulation

Arxiv

0+阅读 · 2023年3月6日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

35+阅读 · 2021年8月2日

A Survey of Machine Learning for Computer Architecture and Systems

Arxiv

18+阅读 · 2021年2月16日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大型语言模型遇上文本属性图：一种融合框架与应用的综述

人工智能赋能自主武器与人类控制第三部分：人类控制与系统操作员 | 35页

【博士论文】用于概率程序与生成模型的变分推断

军事指挥控制系统：2025年5种用途

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Beware of Instantaneous Dependence in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Learning the Finer Things: Bayesian Structure Learning at the Instantiation Level

Arxiv

0+阅读 · 2023年3月8日

A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning

Arxiv

0+阅读 · 2023年3月7日

Spectral Decomposition Representation for Reinforcement Learning

Arxiv

0+阅读 · 2023年3月7日

Investigation of chemical structure recognition by encoder-decoder models in learning progress

Arxiv

0+阅读 · 2023年3月7日

Improving Self-Supervised Learning for Audio Representations by Feature Diversity and Decorrelation

Arxiv

0+阅读 · 2023年3月7日

Sampling-based Exploration for Reinforcement Learning of Dexterous Manipulation

Arxiv

0+阅读 · 2023年3月6日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

35+阅读 · 2021年8月2日

A Survey of Machine Learning for Computer Architecture and Systems

Arxiv

18+阅读 · 2021年2月16日

相关基金

NES1基因联合188Re内放射治疗前列腺癌的实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于自适应特征学习和表观建模的目标跟踪算法研究

国家自然科学基金

2+阅读 · 2014年12月31日

基于A-Train卫星观测的沙尘暴数字重构技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

一类新颖荧光材料五取代四氢嘧啶的合成、光学特性以及结构-性质关系的研究

国家自然科学基金

0+阅读 · 2012年12月31日

卵巢癌肿瘤内异质性的遗传生物学特征分析及相关分子靶标的筛选

国家自然科学基金

0+阅读 · 2012年12月31日

基于protein pathway array 技术导向的胃癌淋巴结转移预警蛋白表达特征的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

InSAR支持下基于支持向量机的地震滑坡空间预测研究

国家自然科学基金

0+阅读 · 2009年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

ClC-3氯通道蛋白在肿瘤转移中的功能研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员