ENTROPY:环境变异器和离线政策优化</s> (ENTROPY: Environment Transformer and Offline Policy Optimization) - 专知论文

会员服务 ·

0

回合 · Performer · Learning · 优化器 · MoDELS ·

2023 年 3 月 7 日

ENTROPY: Environment Transformer and Offline Policy Optimization

翻译：ENTROPY:环境变异器和离线政策优化

Pengqin Wang,Meixin Zhu,Shaojie Shen

Model-based methods provide an effective approach to offline reinforcement learning (RL). They learn an environmental dynamics model from interaction experiences and then perform policy optimization based on the learned model. However, previous model-based offline RL methods lack long-term prediction capability, resulting in large errors when generating multi-step trajectories. We address this issue by developing a sequence modeling architecture, Environment Transformer, which can generate reliable long-horizon trajectories based on offline datasets. We then propose a novel model-based offline RL algorithm, ENTROPY, that learns the dynamics model and reward function by ENvironment TRansformer and performs Offline PolicY optimization. We evaluate the proposed method on MuJoCo continuous control RL environments. Results show that ENTROPY performs comparably or better than the state-of-the-art model-based and model-free offline RL methods and demonstrates more powerful long-term trajectory prediction capability compared to existing model-based offline methods.

翻译：基于模型的方法为离线强化学习提供了有效的方法。它们从互动经验中学习环境动态模型,然后根据所学模型进行政策优化。然而,以往基于模型的离线RL方法缺乏长期预测能力,导致产生多步轨迹时出现大错误。我们通过开发一个序列建模结构来解决这一问题,即环境变换器,它能够产生基于离线数据集的可靠的长离子轨道。然后,我们提出一个新的基于模型的离线RL算法(ENTROPY),它能够学习ENvironment TRansex的动态模型和奖励功能,并进行离线 PolicY优化。我们评估了拟议的穆乔科连续控制RL环境的方法。结果显示,ENTROPY比目前基于模型的离线模型和无模型的离线方法可比较或更好,并展示比现有基于模型的离线方法更强大的长期轨迹预测能力。</s>

0

相关内容

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

CLU基因多态性对迟发型阿尔茨海默病人脑情景记忆环路调控的MRI研究

国家自然科学基金

0+阅读 · 2015年12月31日

mPFC神经环路中突触结构重塑与慢性应激大鼠抑郁样行为的关系研究

国家自然科学基金

0+阅读 · 2015年12月31日

神经系统seipin缺失诱发精神迟滞的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

ANA-12与7, 8-DHF在不同神经环路抗抑郁作用的神经可塑性机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

NeuroD1在体直接转分化星形胶质细胞为神经元对大鼠慢性脊髓损伤治疗作用的实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

无穷Laplace方程解的边界正则性

国家自然科学基金

0+阅读 · 2013年12月31日

G蛋白信号转导调节蛋白3调控血管外膜成纤维细胞表型转变参与动脉粥样硬化的研究

国家自然科学基金

0+阅读 · 2013年12月31日

低层错能镍基变形高温合金反常动态应变时效机理

国家自然科学基金

0+阅读 · 2011年12月31日

锰致神经细胞递质、神经退行性变相关蛋白和死亡机制的综合研究

国家自然科学基金

0+阅读 · 2011年12月31日

核受体法呢醇X受体(FXR)在人肝细胞癌发生机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Decentralized Inference via Capability Type Structures in Cooperative Multi-Agent Systems

Arxiv

0+阅读 · 2023年4月27日

Safety Guaranteed Manipulation Based on Reinforcement Learning Planner and Model Predictive Control Actor

Arxiv

0+阅读 · 2023年4月26日

Inverted Landing in a Small Aerial Robot via Deep Reinforcement Learning for Triggering and Control of Rotational Maneuvers

Arxiv

0+阅读 · 2023年4月25日

Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年4月25日

Loss and Reward Weighing for increased learning in Distributed Reinforcement Learning

Arxiv

0+阅读 · 2023年4月25日

Model-Free Learning and Optimal Policy Design in Multi-Agent MDPs Under Probabilistic Agent Dropout

Arxiv

0+阅读 · 2023年4月24日

Self-Supervised Learning via Maximum Entropy Coding

Arxiv

13+阅读 · 2022年10月20日

Introduction to Online Convex Optimization

Arxiv

23+阅读 · 2021年12月19日

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

Arxiv

10+阅读 · 2021年2月11日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

VIP会员

文章信息

相关主题

相关VIP内容

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Decentralized Inference via Capability Type Structures in Cooperative Multi-Agent Systems

Arxiv

0+阅读 · 2023年4月27日

Safety Guaranteed Manipulation Based on Reinforcement Learning Planner and Model Predictive Control Actor

Arxiv

0+阅读 · 2023年4月26日

Inverted Landing in a Small Aerial Robot via Deep Reinforcement Learning for Triggering and Control of Rotational Maneuvers

Arxiv

0+阅读 · 2023年4月25日

Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年4月25日

Loss and Reward Weighing for increased learning in Distributed Reinforcement Learning

Arxiv

0+阅读 · 2023年4月25日

Model-Free Learning and Optimal Policy Design in Multi-Agent MDPs Under Probabilistic Agent Dropout

Arxiv

0+阅读 · 2023年4月24日

Self-Supervised Learning via Maximum Entropy Coding

Arxiv

13+阅读 · 2022年10月20日

Introduction to Online Convex Optimization

Arxiv

23+阅读 · 2021年12月19日

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

Arxiv

10+阅读 · 2021年2月11日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

相关基金

CLU基因多态性对迟发型阿尔茨海默病人脑情景记忆环路调控的MRI研究

国家自然科学基金

0+阅读 · 2015年12月31日

mPFC神经环路中突触结构重塑与慢性应激大鼠抑郁样行为的关系研究

国家自然科学基金

0+阅读 · 2015年12月31日

神经系统seipin缺失诱发精神迟滞的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

ANA-12与7, 8-DHF在不同神经环路抗抑郁作用的神经可塑性机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

NeuroD1在体直接转分化星形胶质细胞为神经元对大鼠慢性脊髓损伤治疗作用的实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

无穷Laplace方程解的边界正则性

国家自然科学基金

0+阅读 · 2013年12月31日

G蛋白信号转导调节蛋白3调控血管外膜成纤维细胞表型转变参与动脉粥样硬化的研究

国家自然科学基金

0+阅读 · 2013年12月31日

低层错能镍基变形高温合金反常动态应变时效机理

国家自然科学基金

0+阅读 · 2011年12月31日

锰致神经细胞递质、神经退行性变相关蛋白和死亡机制的综合研究

国家自然科学基金

0+阅读 · 2011年12月31日

核受体法呢醇X受体(FXR)在人肝细胞癌发生机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员