State Abstraction as 压缩 in Apprenticeship Learning

2019 年 1 月 19 日 CreateAMind

code:


State Abstraction as Compression in Apprenticeship Learning        https://github.com/david-abel/rl_info_theory             ref infobot


State Abstractions for Lifelong Reinforcement Learnin  https://david-abel.github.io/papers/lifelong_sa_icml_18.pdf



State Abstraction as Compression in Apprenticeship Learning

David Abel1 and Dilip Arumugam2 and Kavosh Asadi1 andYuu Jinnai1 and Michael L. Littman1 and Lawson L.S. Wong3

1: Department of Computer Science, Brown University
2: Department of Computer Science, Stanford University
3: College of Computer and Information Science, Northeastern University





Abstract

State abstraction can give rise to models of environments that are both compressed and useful, thereby enabling efficient sequential decision making. In this work, we offer the first formalism and analysis of the trade-off between compression and performance made in the context of state abstraction for Apprenticeship Learning. We build on Rate-Distortion the- ory, the classic Blahut-Arimoto algorithm, and the Informa- tion Bottleneck method to develop an algorithm for com- puting state abstractions that approximate the optimal trade- off between compression and performance. We illustrate the power of this algorithmic structure to offer insights into ef- fective abstraction, compression, and reinforcement learning through a mixture of analysis, visuals, and experimentation.

1 Introduction

Reinforcement Learning (RL) poses a challenging problem. Agents must learn about their environment through high- dimensional and often noisy observations while receiving sparse and delayed evaluative feedback. The ability to un- derstand one’s surroundings well enough to support effec- tive decision making under these conditions is a remarkable feat, and arguably a hallmark of intelligent behavior. To this end, a long-standing goal of RL is to endow decision-making agents with the ability to acquire and exploit abstract models for use in decision making, drawing inspiration from human cognition (Tenenbaum et al. 2011).

One path toward realizing this goal is to make use of state abstraction, which describes methods for compressing the environment’s state space to distill complex problems into simpler forms (Dietterich 2000b; Andre and Russell 2002; Li, Walsh, and Littman 2006). Critically, the degree of com- pression induced by an abstraction trades off directly with its capacity to represent good behavior. If the abstraction throws away too much information, the resulting abstract model will fail to preserve essential characteristics of the original task (Abel, Hershkowitz, and Littman 2016). Thus, care must be taken to identify state abstractions that balance between an appropriate degree of compression and adequate representational power.

Information Theory offers foundational results about the limits of compression (Shannon 1948). The core of the theory clarifies how to communicate in the presence of noise, culminating in seminal results about the nature of commu- nication and compression that helped establish the science and engineering practices of computation. Of particular rel- evance to our agenda is Rate-Distortion theory, which stud- ies the trade-off between a code’s ability to compress (rate) and represent the original signal (distortion) (Shannon 1948; Berger 1971). Cognitive neuroscience has suggested that perception and generalization are tied to efficient compres- sion (Attneave 1954; Sims 2016; 2018), termed the “efficient coding hypothesis” by Barlow (1961).

The goal of this work is to understand the role of information-theoretic compression in state abstraction for sequential decision making. We draw a parallel betweenstate abstraction as used in reinforcement learning and com- pression as understood in information theory. We build on the seminal work of Shannon (1948), Blahut (1972), Ari- moto (1972) and Tishby, Pereira, and Bialek (1999), and draw inspiration from related work on understanding the re- lationship between abstraction and compression (Botvinick et al. 2015; Solway et al. 2014). While the perspective we introduce is intended to be general, we focus our study in two ways. First, we investigate only state abstraction, de- ferring discussion of temporal (Sutton, Precup, and Singh 1999), action (Hauskrecht et al. 1998), and hierarchical ab- straction (Dayan and Hinton 1993; Dietterich 2000a) to fu- ture work. Second, we address the learning problem when a demonstrator is available, as in Apprenticeship Learn- ing (Atkeson and Schaal 1997; Abbeel and Ng 2004; Argall et al. 2009), which simplifies aspects of our model.

Concretely, we introduce a new objective function that explicitly balances state-compression and performance. Our main result proves this objective is upper bounded by a vari- ant of the Information Bottleneck objective adapted to se- quential decision making. We introduce Deterministic In- formation Bottleneck for State abstraction (DIBS), an algo- rithm that outputs a lossy state abstraction optimizing the trade-off between compressing the state space and preserv- ing the capacity for performance in that compressed state space. We conduct experiments to showcase the relationship between compression and performance captured by the al- gorithm in a traditional grid world and present an extension to high-dimensional observations via experiments with the Atari game Breakout. 



最严强化学习打卡群

登录查看更多
1

相关内容

Performance:International Symposium on Computer Performance Modeling, Measurements and Evaluation。 Explanation:计算机性能建模、测量和评估国际研讨会。 Publisher:ACM。 SIT:http://dblp.uni-trier.de/db/conf/performance/
100+篇《自监督学习(Self-Supervised Learning)》论文最新合集
专知会员服务
161+阅读 · 2020年3月18日
深度强化学习策略梯度教程,53页ppt
专知会员服务
176+阅读 · 2020年2月1日
【强化学习资源集合】Awesome Reinforcement Learning
专知会员服务
93+阅读 · 2019年12月23日
强化学习最新教程,17页pdf
专知会员服务
168+阅读 · 2019年10月11日
[综述]深度学习下的场景文本检测与识别
专知会员服务
77+阅读 · 2019年10月10日
MIT新书《强化学习与最优控制》
专知会员服务
270+阅读 · 2019年10月9日
Transferring Knowledge across Learning Processes
CreateAMind
25+阅读 · 2019年5月18日
逆强化学习-学习人先验的动机
CreateAMind
15+阅读 · 2019年1月18日
强化学习的Unsupervised Meta-Learning
CreateAMind
17+阅读 · 2019年1月7日
Unsupervised Learning via Meta-Learning
CreateAMind
41+阅读 · 2019年1月3日
RL 真经
CreateAMind
5+阅读 · 2018年12月28日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
16+阅读 · 2018年12月24日
笔记 | Deep active learning for named entity recognition
黑龙江大学自然语言处理实验室
24+阅读 · 2018年5月27日
Hierarchical Imitation - Reinforcement Learning
CreateAMind
19+阅读 · 2018年5月25日
Reinforcement Learning: An Introduction 2018第二版 500页
CreateAMind
11+阅读 · 2018年4月27日
强化学习 cartpole_a3c
CreateAMind
9+阅读 · 2017年7月21日
Deep Learning in Video Multi-Object Tracking: A Survey
Arxiv
55+阅读 · 2019年7月31日
Learning by Abstraction: The Neural State Machine
Arxiv
6+阅读 · 2019年7月11日
VIP会员
相关VIP内容
100+篇《自监督学习(Self-Supervised Learning)》论文最新合集
专知会员服务
161+阅读 · 2020年3月18日
深度强化学习策略梯度教程,53页ppt
专知会员服务
176+阅读 · 2020年2月1日
【强化学习资源集合】Awesome Reinforcement Learning
专知会员服务
93+阅读 · 2019年12月23日
强化学习最新教程,17页pdf
专知会员服务
168+阅读 · 2019年10月11日
[综述]深度学习下的场景文本检测与识别
专知会员服务
77+阅读 · 2019年10月10日
MIT新书《强化学习与最优控制》
专知会员服务
270+阅读 · 2019年10月9日
相关资讯
Transferring Knowledge across Learning Processes
CreateAMind
25+阅读 · 2019年5月18日
逆强化学习-学习人先验的动机
CreateAMind
15+阅读 · 2019年1月18日
强化学习的Unsupervised Meta-Learning
CreateAMind
17+阅读 · 2019年1月7日
Unsupervised Learning via Meta-Learning
CreateAMind
41+阅读 · 2019年1月3日
RL 真经
CreateAMind
5+阅读 · 2018年12月28日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
16+阅读 · 2018年12月24日
笔记 | Deep active learning for named entity recognition
黑龙江大学自然语言处理实验室
24+阅读 · 2018年5月27日
Hierarchical Imitation - Reinforcement Learning
CreateAMind
19+阅读 · 2018年5月25日
Reinforcement Learning: An Introduction 2018第二版 500页
CreateAMind
11+阅读 · 2018年4月27日
强化学习 cartpole_a3c
CreateAMind
9+阅读 · 2017年7月21日
Top
微信扫码咨询专知VIP会员