强大的可预见控制 (Robust Predictable Control) - 专知论文

会员服务 ·

0

INFORMS · 稳健性 · SimPLe · 学成 · 优化器 ·

2021 年 9 月 7 日

Robust Predictable Control

翻译：强大的可预见控制

Benjamin Eysenbach,Ruslan Salakhutdinov,Sergey Levine

from arxiv, Project site with videos and code: https://ben-eysenbach.github.io/rpc

Many of the challenges facing today's reinforcement learning (RL) algorithms, such as robustness, generalization, transfer, and computational efficiency are closely related to compression. Prior work has convincingly argued why minimizing information is useful in the supervised learning setting, but standard RL algorithms lack an explicit mechanism for compression. The RL setting is unique because (1) its sequential nature allows an agent to use past information to avoid looking at future observations and (2) the agent can optimize its behavior to prefer states where decision making requires few bits. We take advantage of these properties to propose a method (RPC) for learning simple policies. This method brings together ideas from information bottlenecks, model-based RL, and bits-back coding into a simple and theoretically-justified algorithm. Our method jointly optimizes a latent-space model and policy to be self-consistent, such that the policy avoids states where the model is inaccurate. We demonstrate that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck. We also demonstrate that our method learns policies that are more robust and generalize better to new tasks.

翻译：今天强化学习(RL)算法面临的许多挑战,例如稳健性、一般化、转移和计算效率等,都与压缩密切相关。先前的工作令人信服地论证了为什么信息最小化在受监督的学习环境中是有用的,但标准的RL算法缺乏明确的压缩机制。 RL设置是独特的,因为:(1) 其相继性质允许代理人使用过去的信息来避免未来观察,(2) 代理人可以优化其行为,而选择决策需要很少比分的国家。我们利用这些属性提出一种学习简单政策的方法(RPC)。这种方法将信息瓶颈、基于模型的RL和回位编码集成一个简单和理论上合理的算法。我们的方法共同优化了潜空模型和政策的自我一致性,从而避免了模型的不准确性。我们证明,我们的方法比先前的方法更严格得多,比标准的信息瓶颈高5x的报酬。我们还表明,我们的方法学习的政策比新任务更稳健、更概括。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【2021新书】编码艺术，Coding Art，284页pdf

【2021新书】编码艺术，Coding Art，284页pdf

专知会员服务

77+阅读 · 2021年1月10日

【理解计算机视觉损失函数】《Understanding Loss Functions in Computer Vision!》by Sowmya Yellapragad

【理解计算机视觉损失函数】《Understanding Loss Functions in Computer Vision!》by Sowmya Yellapragad

专知会员服务

44+阅读 · 2020年3月4日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【论文】深度学习的最优化:理论和算法（Optimization for deep learning: theory and algorithms）

【论文】深度学习的最优化:理论和算法（Optimization for deep learning: theory and algorithms）

专知会员服务

148+阅读 · 2019年12月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

已删除

将门创投

6+阅读 · 2018年12月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Meta Subspace Optimization

Arxiv

0+阅读 · 2021年10月28日

Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks

Arxiv

0+阅读 · 2021年10月28日

Robust Implicit Networks via Non-Euclidean Contractions

Arxiv

0+阅读 · 2021年10月26日

Recurrent Off-policy Baselines for Memory-based Continuous Control

Arxiv

0+阅读 · 2021年10月25日

Contrastive Active Inference

Arxiv

4+阅读 · 2021年10月19日

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

Arxiv

5+阅读 · 2021年8月11日

Inverse Constrained Reinforcement Learning

Arxiv

8+阅读 · 2021年5月21日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

Arxiv

5+阅读 · 2018年7月23日

Learning to Adapt: Meta-Learning for Model-Based Control

Arxiv

9+阅读 · 2018年3月30日

VIP会员

文章信息

相关主题

相关VIP内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【2021新书】编码艺术，Coding Art，284页pdf

【2021新书】编码艺术，Coding Art，284页pdf

专知会员服务

77+阅读 · 2021年1月10日

【理解计算机视觉损失函数】《Understanding Loss Functions in Computer Vision!》by Sowmya Yellapragad

【理解计算机视觉损失函数】《Understanding Loss Functions in Computer Vision!》by Sowmya Yellapragad

专知会员服务

44+阅读 · 2020年3月4日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【论文】深度学习的最优化:理论和算法（Optimization for deep learning: theory and algorithms）

【论文】深度学习的最优化:理论和算法（Optimization for deep learning: theory and algorithms）

专知会员服务

148+阅读 · 2019年12月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

已删除

将门创投

6+阅读 · 2018年12月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Meta Subspace Optimization

Arxiv

0+阅读 · 2021年10月28日

Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks

Arxiv

0+阅读 · 2021年10月28日

Robust Implicit Networks via Non-Euclidean Contractions

Arxiv

0+阅读 · 2021年10月26日

Recurrent Off-policy Baselines for Memory-based Continuous Control

Arxiv

0+阅读 · 2021年10月25日

Contrastive Active Inference

Arxiv

4+阅读 · 2021年10月19日

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

Arxiv

5+阅读 · 2021年8月11日

Inverse Constrained Reinforcement Learning

Arxiv

8+阅读 · 2021年5月21日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

Arxiv

5+阅读 · 2018年7月23日

Learning to Adapt: Meta-Learning for Model-Based Control

Arxiv

9+阅读 · 2018年3月30日

微信扫码咨询专知VIP会员