Yoshua Bengio 3篇强化学习论文学习disentangling 特征

2018 年 4 月 26 日 CreateAMind


Disentangling the independently controllable factors of variation by interacting with the world 

https://arxiv.org/abs/1802.09484

It has been postulated that a good representation is one that disentangles the under- lying explanatory factors of variation. However, it remains an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors could be discovered if the learner is allowed to interact with its environment. The agent can experiment with different actions and observe their effects. More specifically, we hypothesize that some of these factors correspond to aspects of the environment which are independently controllable, i.e., that there exists a policy and a learnable feature for each such aspect of the environment, such that this policy can yield changes in that feature with minimal changes to other features that explain the statistical variations in the observed data. We propose a specific objective function to find such factors, and verify experimentally that it can indeed disentangle independently controllable aspects of the environment without any extrinsic reward signal. 确实能够在没有任何外部奖励信号的情况下,解开环境中可独立控制的方面。


mechanism for representation learning that has close ties to intrinsic motivation mechanisms and causality.(内部驱动,类似好奇心?,因果关系) This mechanism explicitly links the agent’s control over its environment to the representation of the environment that is learned by the agent

can push a model to learn to disentangle its input in a meaningful way, and learn to represent factors which take multiple actions to change and show that these representations make it possible to perform model-based predictions in the learned latent space, rather than in a low-level input space (e.g. pixels). 


We hypothesize that interactions can be the key to learning how to disentangle the various causal factors of the stream of observations that an agent is faced with, and that such learning can be done in an unsupervised way.







https://arxiv.org/abs/1708.01289  

Independently Controllable Factors

Abstract

It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation. However, it remains an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors could be discovered if the learner is allowed to interact with its environment. The agent can experiment with different actions and observe their effects. More specifically, we hypothesize that some of these factors correspond to aspects of the environment which are independently controllable, i.e., that there exists a policy and a learnable feature for each such aspect of the environment, such that this policy can yield changes in that feature with minimal changes to other features that explain the statistical variations in the observed data. We propose a specific objective function to find such factors and verify experimentally that it can indeed disentangle independently controllable aspects of the environment without any extrinsic reward signal. 


Specifically, we hypothesize that some of the factors explaining variations in the data correspond to aspects of the world that can be controlled by the agent. For example, an object that could be pushed around or picked up independently of others is an independently controllable aspect of the environment. Our approach therefore aims to jointly discover a set of features (functions of the environment state) and policies (which change the state) such that each policy controls the associated feature while leaving the other features unchanged as much as possible. In §2 and §3 we explain this mechanism and show experimental results for the simplest instantiation of this new principle. In §5 we discuss how this principle could be applied more generally, and what are the research challenges that emerge. 





https://arxiv.org/abs/1804.06955

Disentangling Controllable and Uncontrollable Factors of Variation by Interacting with the World

Abstract

We introduce a method for disentangling independently controllable and uncon- trollable factors of variation by interacting with the world. Disentanglement leads to good representations and it is important when applying deep neural networks (DNNs) in fields where explanations are necessary. This article focuses on rein- forcement learning (RL) approach for disentangling factors of variation, however, previous methods lacks a mechanism for representing uncontrollable obstacles. To tackle this problem, we train two DNNs simultaneously: one that represents the controllable object and another that represents the uncontrollable obstacles. During training, we used the parameters from a previous RL-based model as our initial parameters to improve stability. We also conduct simple toy simulations to show that our model can indeed disentangle controllable and uncontrollable factors of variation and that it is effective for a task involving the acquisition of extrinsic rewards. 



ref : 智能的几点随想   观点和 Yoshua Bengio 的观点基本一致。



登录查看更多
2

相关内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集
专知会员服务
161+阅读 · 2020年3月18日
【论文】欺骗学习(Learning by Cheating)
专知会员服务
26+阅读 · 2020年1月3日
专知会员服务
53+阅读 · 2019年12月22日
MIT新书《强化学习与最优控制》
专知会员服务
270+阅读 · 2019年10月9日
强化学习三篇论文 避免遗忘等
CreateAMind
19+阅读 · 2019年5月24日
逆强化学习-学习人先验的动机
CreateAMind
15+阅读 · 2019年1月18日
强化学习的Unsupervised Meta-Learning
CreateAMind
17+阅读 · 2019年1月7日
Unsupervised Learning via Meta-Learning
CreateAMind
41+阅读 · 2019年1月3日
meta learning 17年:MAML SNAIL
CreateAMind
11+阅读 · 2019年1月2日
Disentangled的假设的探讨
CreateAMind
9+阅读 · 2018年12月10日
【NIPS2018】接收论文列表
专知
5+阅读 · 2018年9月10日
vae 相关论文 表示学习 1
CreateAMind
12+阅读 · 2018年9月6日
强化学习族谱
CreateAMind
26+阅读 · 2017年8月2日
The Measure of Intelligence
Arxiv
6+阅读 · 2019年11月5日
Arxiv
11+阅读 · 2018年7月8日
Arxiv
5+阅读 · 2018年4月22日
Arxiv
4+阅读 · 2018年4月10日
VIP会员
相关资讯
强化学习三篇论文 避免遗忘等
CreateAMind
19+阅读 · 2019年5月24日
逆强化学习-学习人先验的动机
CreateAMind
15+阅读 · 2019年1月18日
强化学习的Unsupervised Meta-Learning
CreateAMind
17+阅读 · 2019年1月7日
Unsupervised Learning via Meta-Learning
CreateAMind
41+阅读 · 2019年1月3日
meta learning 17年:MAML SNAIL
CreateAMind
11+阅读 · 2019年1月2日
Disentangled的假设的探讨
CreateAMind
9+阅读 · 2018年12月10日
【NIPS2018】接收论文列表
专知
5+阅读 · 2018年9月10日
vae 相关论文 表示学习 1
CreateAMind
12+阅读 · 2018年9月6日
强化学习族谱
CreateAMind
26+阅读 · 2017年8月2日
相关论文
The Measure of Intelligence
Arxiv
6+阅读 · 2019年11月5日
Arxiv
11+阅读 · 2018年7月8日
Arxiv
5+阅读 · 2018年4月22日
Arxiv
4+阅读 · 2018年4月10日
Top
微信扫码咨询专知VIP会员