强文!LEARNING ACTIONABLE REPRESENTATIONS WITH GOAL.. POLICIES

2018 年 11 月 26 日 CreateAMind

LEARNING ACTIONABLE REPRESENTATIONS WITH GOAL-CONDITIONED POLICIES 


Dibya Ghosh∗ , Abhishek Gupta & Sergey Levine Department of Electrical Engineering and Computer Science University of California, Berkeley Berkeley, CA 94720, USA




ABSTRACT Representation learning is a central challenge across a range of machine learning areas. In reinforcement learning, effective and functional representations have the potential to tremendously accelerate learning progress and solve more challenging problems. Most prior work on representation learning has focused on generative approaches, learning representations that capture all underlying factors of variation in the observation space in a more disentangled or well-ordered manner. In this paper, we instead aim to learn functionally salient representations: representations that are not necessarily complete in terms of capturing all factors of variation in the observation space, but rather aim to capture those factors of variation that are important for decision making – that are “actionable.” These representations are aware of the dynamics of the environment, and capture only the elements of the observation that are necessary for decision making rather than all factors of variation, without explicit reconstruction of the observation. We show how these representations can be useful to improve exploration for sparse reward problems, to enable long horizon hierarchical reinforcement learning, and as a state representation for learning policies for downstream tasks. We evaluate our method on a number of simulated environments, and compare it to prior methods for representation learning, exploration, and hierarchical reinforcement learning.



重点笔记:

Maximum entropy RL. Maximum entropy RL algorithms modify the RL objective, and instead learns a policy to maximize the reward as well as the entropy of the policy (Haarnoja et al., 2017; Todorov, 2006), according to π ? = arg maxπ Eπ[r(s, a)] + H(π). In contrast to standard RL, where optimal policies in fully observed environments are deterministic, the solution in maximum entropy RL is a stochastic policy, where the entropy reflects the sensitivity of the rewards to the action: when the choice of action has minimal effect on future rewards, actions are more random, and when the choice of action is critical, the actions are more deterministic. In this way, the action distributions for a maximum entropy policy carry more information about the dynamics of the task.



In this work, we extract a representation that can distinguish states based on actions required to reach them, which we term an actionable representation for control (ARC). In order to learn state representations φ that can capture the elements of the state which are important for decision making, we first consider defining actionable distances DAct(s1, s2) between states. 



 thereby implicitly capturing dynamics. If actions required for reaching state s1 are very different from the actions needed for reaching state s2, then these states are functionally different, and should have large actionable distances







6 实验 非常充分,比其他方法都好很多。

6.2 LEARNING THE GOAL-CONDITIONED POLICY AND ARC REPRESENTATION

6.5 LEVERAGING ACTIONABLE REPRESENTATIONS FOR REWARD SHAPING

6.6 LEVERAGING ACTIONABLE REPRESENTATIONS AS FEATURES FOR LEARNING POLICIES

6.7 BUILDING HIERARCHIES FROM ACTIONABLE REPRESENTATIONS




CreateAMind永久招聘:能复现出次论文跑出效果,工资你来提。


申请傅盛青年社群思考梳理

招聘笔记:http://note.youdao.com/noteshare?id=b0f27c01f384e96d30ee1c1d2a5c7d31


https://arxiv.org/pdf/1811.07819.pdf


登录查看更多
0

相关内容

【干货书】真实机器学习,264页pdf,Real-World Machine Learning
100+篇《自监督学习(Self-Supervised Learning)》论文最新合集
专知会员服务
164+阅读 · 2020年3月18日
深度强化学习策略梯度教程,53页ppt
专知会员服务
178+阅读 · 2020年2月1日
专知会员服务
53+阅读 · 2019年12月22日
Stabilizing Transformers for Reinforcement Learning
专知会员服务
59+阅读 · 2019年10月17日
Transferring Knowledge across Learning Processes
CreateAMind
28+阅读 · 2019年5月18日
强化学习的Unsupervised Meta-Learning
CreateAMind
17+阅读 · 2019年1月7日
Unsupervised Learning via Meta-Learning
CreateAMind
42+阅读 · 2019年1月3日
meta learning 17年:MAML SNAIL
CreateAMind
11+阅读 · 2019年1月2日
RL 真经
CreateAMind
5+阅读 · 2018年12月28日
Disentangled的假设的探讨
CreateAMind
9+阅读 · 2018年12月10日
disentangled-representation-papers
CreateAMind
26+阅读 · 2018年9月12日
vae 相关论文 表示学习 1
CreateAMind
12+阅读 · 2018年9月6日
Auto-Encoding GAN
CreateAMind
7+阅读 · 2017年8月4日
Arxiv
13+阅读 · 2020年4月12日
Arxiv
35+阅读 · 2020年1月2日
Continual Unsupervised Representation Learning
Arxiv
7+阅读 · 2019年10月31日
VIP会员
相关资讯
Transferring Knowledge across Learning Processes
CreateAMind
28+阅读 · 2019年5月18日
强化学习的Unsupervised Meta-Learning
CreateAMind
17+阅读 · 2019年1月7日
Unsupervised Learning via Meta-Learning
CreateAMind
42+阅读 · 2019年1月3日
meta learning 17年:MAML SNAIL
CreateAMind
11+阅读 · 2019年1月2日
RL 真经
CreateAMind
5+阅读 · 2018年12月28日
Disentangled的假设的探讨
CreateAMind
9+阅读 · 2018年12月10日
disentangled-representation-papers
CreateAMind
26+阅读 · 2018年9月12日
vae 相关论文 表示学习 1
CreateAMind
12+阅读 · 2018年9月6日
Auto-Encoding GAN
CreateAMind
7+阅读 · 2017年8月4日
Top
微信扫码咨询专知VIP会员