几何东亚勘探 (Geometric Entropic Exploration)

Zhaohan Daniel Guo,Mohammad Gheshlaghi Azar,Alaa Saade,Shantanu Thakoor,Bilal Piot,Bernardo Avila Pires,Michal Valko,Thomas Mesnard,Tor Lattimore,Rémi Munos

Exploration is essential for solving complex Reinforcement Learning (RL) tasks. Maximum State-Visitation Entropy (MSVE) formulates the exploration problem as a well-defined policy optimization problem whose solution aims at visiting all states as uniformly as possible. This is in contrast to standard uncertainty-based approaches where exploration is transient and eventually vanishes. However, existing approaches to MSVE are theoretically justified only for discrete state-spaces as they are oblivious to the geometry of continuous domains. We address this challenge by introducing Geometric Entropy Maximisation (GEM), a new algorithm that maximises the geometry-aware Shannon entropy of state-visits in both discrete and continuous domains. Our key theoretical contribution is casting geometry-aware MSVE exploration as a tractable problem of optimising a simple and novel noise-contrastive objective function. In our experiments, we show the efficiency of GEM in solving several RL problems with sparse rewards, compared against other deep RL exploration approaches.

翻译：探索对于解决复杂的强化学习(RL)任务至关重要。国家最大视觉(MSVE)将勘探问题设计成一个定义明确的政策优化问题,其解决办法是尽可能统一地访问所有各州。这与标准的基于不确定性的方法形成鲜明对比,在这种方法中,勘探是短暂的,最终会消失。然而,现有的MSVE方法在理论上只对离散的状态空间是有道理的,因为它们忽略了连续域的几何测量。我们通过引入几何成形最大化(GEM)来应对这一挑战,这是一种在离散和连续的域内最大限度地实现国家访问的几何性-有觉的香农增生新算法。我们的主要理论贡献是将MSVE探索定位为选择简单和新颖的噪音调频目标功能的可感性问题。我们的实验显示,与其它深度RL探索方法相比,GEM在以微量的回报解决若干RL问题方面的效率。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日