Let offline RL 流:在流动正常化的边缘空间培训保守剂 (Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows) - 专知论文

会员服务 ·

0

规范化的 · Learning · 潜在 · Agent · MoDELS ·

2022 年 11 月 20 日

Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows

翻译：Let offline RL 流:在流动正常化的边缘空间培训保守剂

Dmitriy Akimov,Vladislav Kurenkov,Alexander Nikulin,Denis Tarasov,Sergey Kolesnikov

from arxiv, Accepted at 3rd Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 2022

Offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions. There are two major challenges in this setting: (1) extrapolation error caused by approximating the value of state-action pairs not well-covered by the training data and (2) distributional shift between behavior and inference policies. One way to tackle these problems is to induce conservatism - i.e., keeping the learned policies closer to the behavioral ones. To achieve this, we build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model, which we use as a conservative action encoder. This Normalizing Flows action encoder is pre-trained in a supervised manner on the offline dataset, and then an additional policy model - controller in the latent space - is trained via reinforcement learning. This approach avoids querying actions outside of the training dataset and therefore does not require additional regularization for out-of-dataset actions. We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms with generative action models on a large portion of datasets.

翻译：离线强化学习旨在培训关于预先记录和固定的数据集的政策,而没有任何额外的环境互动。在这一背景下,存在两大挑战:(1) 由于培训数据没有很好覆盖的州-对的值的接近,造成外推错误;(2) 行为和推理政策之间的分布变化; 解决这些问题的方法之一是诱发保守主义,即使所学政策更接近于行为政策。为实现这一点,我们在潜在行动空间学习政策的基础上再接再厉,并使用一种特殊的正常流动模式来构建一种基因化模型,我们作为保守的行动编码器使用。这种正常化流程行动编码器在离线数据集上以监督的方式预先培训,然后通过强化学习来培训另一个政策模型,即潜在空间的控制器。这种方法避免在培训数据集之外查询行动,因此不需要对外部数据设置行动作进一步的规范。我们评估了我们关于各种移动和导航任务的方法,表明我们的方法在近期提出的大型模型上超越了我们的方法。

0

相关内容

规范化的

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

脂联素通过p38 MAPK-STAT5途径调节URSA中Th17/Treg失衡的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

TRAIL诱骗受体DcR2介导糖尿病肾病衰老肾小管上皮细胞凋亡逃逸的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

稀土MOF纳米荧光探针的设计合成及其生物应用

国家自然科学基金

0+阅读 · 2013年12月31日

HGF/c-Met介导COL1A2在年龄相关性黄斑变性发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

过渡金属催化C(sp3)-H键氟化反应的研究

国家自然科学基金

0+阅读 · 2013年12月31日

典型全氟化合物（PFCs）诱导转基因鼠DNA损伤及其机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

可调控稀土高分子配合物设计与发光性质研究

国家自然科学基金

0+阅读 · 2011年12月31日

稀土膦卡宾配合物的合成及反应性能研究

国家自然科学基金

0+阅读 · 2008年12月31日

Further Exploration of the Effects of Time-varying Covariate in Growth Mixture Models with Nonlinear Trajectories

Arxiv

0+阅读 · 2023年1月23日

Offline Policy Evaluation with Out-of-Sample Guarantees

Arxiv

0+阅读 · 2023年1月20日

Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control

Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control

Arxiv

0+阅读 · 2023年1月20日

Generative Slate Recommendation with Reinforcement Learning

Arxiv

0+阅读 · 2023年1月20日

Mixed-Integer Optimization with Constraint Learning

Arxiv

0+阅读 · 2023年1月20日

Score-based Causal Representation Learning with Interventions

Arxiv

0+阅读 · 2023年1月19日

Data-driven kernel designs for optimized greedy schemes: A machine learning perspective

Arxiv

0+阅读 · 2023年1月19日

Interval Reachability of Nonlinear Dynamical Systems with Neural Network Controllers

Arxiv

0+阅读 · 2023年1月19日

Characterizing Structural Hardness of Logic Programs: What makes Cycles and Reachability Hard for Treewidth?

Arxiv

0+阅读 · 2023年1月18日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Further Exploration of the Effects of Time-varying Covariate in Growth Mixture Models with Nonlinear Trajectories

Arxiv

0+阅读 · 2023年1月23日

Offline Policy Evaluation with Out-of-Sample Guarantees

Arxiv

0+阅读 · 2023年1月20日

Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control

Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control

Arxiv

0+阅读 · 2023年1月20日

Generative Slate Recommendation with Reinforcement Learning

Arxiv

0+阅读 · 2023年1月20日

Mixed-Integer Optimization with Constraint Learning

Arxiv

0+阅读 · 2023年1月20日

Score-based Causal Representation Learning with Interventions

Arxiv

0+阅读 · 2023年1月19日

Data-driven kernel designs for optimized greedy schemes: A machine learning perspective

Arxiv

0+阅读 · 2023年1月19日

Interval Reachability of Nonlinear Dynamical Systems with Neural Network Controllers

Arxiv

0+阅读 · 2023年1月19日

Characterizing Structural Hardness of Logic Programs: What makes Cycles and Reachability Hard for Treewidth?

Arxiv

0+阅读 · 2023年1月18日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

相关基金

脂联素通过p38 MAPK-STAT5途径调节URSA中Th17/Treg失衡的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

TRAIL诱骗受体DcR2介导糖尿病肾病衰老肾小管上皮细胞凋亡逃逸的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

稀土MOF纳米荧光探针的设计合成及其生物应用

国家自然科学基金

0+阅读 · 2013年12月31日

HGF/c-Met介导COL1A2在年龄相关性黄斑变性发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

过渡金属催化C(sp3)-H键氟化反应的研究

国家自然科学基金

0+阅读 · 2013年12月31日

典型全氟化合物（PFCs）诱导转基因鼠DNA损伤及其机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

可调控稀土高分子配合物设计与发光性质研究

国家自然科学基金

0+阅读 · 2011年12月31日

稀土膦卡宾配合物的合成及反应性能研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员