OptiDICE: 通过固定分发校正估计,通过固定分发校正估算,实现离线政策优化 (OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation) - 专知论文

会员服务 ·

0

平稳分布 · 优化器 · 估计/估计量 · 过估计 · 平稳的 ·

2021 年 6 月 21 日

OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

翻译：OptiDICE: 通过固定分发校正估计,通过固定分发校正估算,实现离线政策优化

Jongmin Lee,Wonseok Jeon,Byung-Jun Lee,Joelle Pineau,Kee-Eung Kim

from arxiv, 26 pages, 11 figures, Accepted at ICML 2021

We consider the offline reinforcement learning (RL) setting where the agent aims to optimize the policy solely from the data without further environment interactions. In offline RL, the distributional shift becomes the primary source of difficulty, which arises from the deviation of the target policy being optimized from the behavior policy used for data collection. This typically causes overestimation of action values, which poses severe problems for model-free algorithms that use bootstrapping. To mitigate the problem, prior offline RL algorithms often used sophisticated techniques that encourage underestimation of action values, which introduces an additional set of hyperparameters that need to be tuned properly. In this paper, we present an offline RL algorithm that prevents overestimation in a more principled way. Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy and does not rely on policy-gradients, unlike previous offline RL algorithms. Using an extensive set of benchmark datasets for offline RL, we show that OptiDICE performs competitively with the state-of-the-art methods.

翻译：我们考虑了脱线强化学习(RL)设置,代理商在这种设置中只希望从数据中优化政策而无需进一步环境互动。在脱线RL中,分配转换成为主要困难来源,因为目标政策偏离了数据收集所用行为政策,而目标政策偏离了数据收集所用行为政策。这通常造成对行动价值的过高估计,对使用靴子的无型算法造成严重问题。为了缓解问题,前离线RL算法经常使用鼓励低估行动价值的尖端技术,这增加了一组需要适当调整的超参数。在本文中,我们提出了一个离线的RL算法,防止以更有原则的方式过高估计。我们的算法,OptiDICE, 直接估计了最佳政策的固定分布修正,并不依赖政策等级,与以前的离线 RL算法不同。我们用一套广泛的基准数据集显示,OptiDICE在离线 RL上,我们显示OptiDICE使用最先进的方法进行竞争。

0

相关内容

平稳分布

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

强化学习和最优控制的《十个关键点》81页PPT汇总

强化学习和最优控制的《十个关键点》81页PPT汇总

专知会员服务

107+阅读 · 2020年3月2日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

腊月廿八 | 强化学习-TRPO和PPO背后的数学

腊月廿八 | 强化学习-TRPO和PPO背后的数学

AI研习社

18+阅读 · 2019年2月2日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Robust low-rank multilinear tensor approximation for a joint estimation of the multilinear rank and the loading matrices

Arxiv

0+阅读 · 2021年8月21日

Flexible rational approximation for matrix functions

Arxiv

0+阅读 · 2021年8月20日

Consensus-based Optimization for 3D Human Pose Estimation in Camera Coordinates

Consensus-based Optimization for 3D Human Pose Estimation in Camera Coordinates

Arxiv

1+阅读 · 2021年8月20日

Communication-Efficient Federated Learning via Robust Distributed Mean Estimation

Arxiv

0+阅读 · 2021年8月19日

On Accelerating Distributed Convex Optimizations

On Accelerating Distributed Convex Optimizations

Arxiv

0+阅读 · 2021年8月19日

Cooperative Multi-Agent Policy Gradients with Sub-optimal Demonstration

Arxiv

0+阅读 · 2021年8月19日

Conservative Objective Models for Effective Offline Model-Based Optimization

Arxiv

4+阅读 · 2021年7月14日

Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Arxiv

8+阅读 · 2020年11月26日

Pretrained Transformers Improve Out-of-Distribution Robustness

Arxiv

5+阅读 · 2020年4月13日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

强化学习和最优控制的《十个关键点》81页PPT汇总

强化学习和最优控制的《十个关键点》81页PPT汇总

专知会员服务

107+阅读 · 2020年3月2日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

腊月廿八 | 强化学习-TRPO和PPO背后的数学

腊月廿八 | 强化学习-TRPO和PPO背后的数学

AI研习社

18+阅读 · 2019年2月2日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Robust low-rank multilinear tensor approximation for a joint estimation of the multilinear rank and the loading matrices

Arxiv

0+阅读 · 2021年8月21日

Flexible rational approximation for matrix functions

Arxiv

0+阅读 · 2021年8月20日

Consensus-based Optimization for 3D Human Pose Estimation in Camera Coordinates

Consensus-based Optimization for 3D Human Pose Estimation in Camera Coordinates

Arxiv

1+阅读 · 2021年8月20日

Communication-Efficient Federated Learning via Robust Distributed Mean Estimation

Arxiv

0+阅读 · 2021年8月19日

On Accelerating Distributed Convex Optimizations

On Accelerating Distributed Convex Optimizations

Arxiv

0+阅读 · 2021年8月19日

Cooperative Multi-Agent Policy Gradients with Sub-optimal Demonstration

Arxiv

0+阅读 · 2021年8月19日

Conservative Objective Models for Effective Offline Model-Based Optimization

Arxiv

4+阅读 · 2021年7月14日

Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Arxiv

8+阅读 · 2020年11月26日

Pretrained Transformers Improve Out-of-Distribution Robustness

Arxiv

5+阅读 · 2020年4月13日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

微信扫码咨询专知VIP会员