Q-学习决策变换器:利用动态编程在离线转线中进行有条件序列建模 (Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL) - 专知论文

会员服务 ·

0

dynamic programming · Learning · 变换 · Performer · MoDELS ·

2022 年 10 月 4 日

Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL

翻译：Q-学习决策变换器:利用动态编程在离线转线中进行有条件序列建模

Taku Yamagata,Ahmed Khalil,Raul Santos-Rodriguez

Recent works have shown that tackling offline reinforcement learning (RL) with a conditional policy produces promising results. The Decision Transformer (DT) combines the conditional policy approach and a transformer architecture, showing competitive performance against several benchmarks. However, DT lacks stitching ability -- one of the critical abilities for offline RL to learn the optimal policy from sub-optimal trajectories. This issue becomes particularly significant when the offline dataset only contains sub-optimal trajectories. On the other hand, the conventional RL approaches based on Dynamic Programming (such as Q-learning) do not have the same limitation; however, they suffer from unstable learning behaviours, especially when they rely on function approximation in an off-policy learning setting. In this paper, we propose the Q-learning Decision Transformer (QDT) to address the shortcomings of DT by leveraging the benefits of Dynamic Programming (Q-learning). It utilises the Dynamic Programming results to relabel the return-to-go in the training data to then train the DT with the relabelled data. Our approach efficiently exploits the benefits of these two approaches and compensates for each other's shortcomings to achieve better performance. We empirically show these in both simple toy environments and the more complex D4RL benchmark, showing competitive performance gains.

翻译：最近的工作表明,通过有条件的政策解决离线强化学习(RL)会产生有希望的结果。决定变换器(DT)将有条件的政策办法和变压器结构结合起来,显示与若干基准相比的竞争性业绩。然而,DT缺乏缝合能力 -- -- 脱线的RL从亚最佳轨迹学习最佳政策的关键能力之一。当离线数据集仅包含亚最佳轨迹时,这一问题就变得特别重要。另一方面,基于动态方案拟订(例如Q学习)的传统RL方法没有相同的限制;然而,它们受到不稳定的学习行为的影响,特别是当它们依赖离政策学习环境的功能近似时。在本文件中,我们建议Q学习决定变换器(QDTT)通过利用动态方案拟订(Q-学习)的好处来解决DT的缺点。它利用动态方案拟订结果在培训数据中重新标出返回到返回的数据,然后用重新标签的数据来培训DT。我们的方法有效地利用了这两种方法的效益,在非政策学习环境中,从而更好地展示了这些比较的成绩。

0

相关内容

dynamic programming

dynamic programming

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Progerin/PrelaminA诱发早老症的蛋白质组学研究

国家自然科学基金

1+阅读 · 2015年12月31日

Decorin对急性缺血性卒中后血脑屏障中ZO-1蛋白的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

炎症因子通过Rictor调控肾癌转移的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

城市地铁用LiB-EDLC混合储能系统功率/能量匹配特性及协同控制策略

国家自然科学基金

0+阅读 · 2014年12月31日

NLRP3炎症小体在无烟气烟草致肝损伤中作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

雷帕霉素复合物1在巨噬细胞炎症反应中的作用与机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

力环境下染色质重塑复合物SWI/SNF对基因可变剪接的调控机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

MFAP-1调控信使RNA剪接过程的分子遗传研究

国家自然科学基金

0+阅读 · 2009年12月31日

红笛鲷仔鱼抗菌免疫基因的克隆及表达时序研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于量子点发光探针研究抗逆植物激素细胞信号转导

国家自然科学基金

0+阅读 · 2009年12月31日

Leveraging Offline Data in Online Reinforcement Learning

Arxiv

0+阅读 · 2022年11月9日

Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration

Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration

Arxiv

0+阅读 · 2022年11月9日

A framework for online, stabilizing reinforcement learning

Arxiv

0+阅读 · 2022年11月8日

A Simple Algorithm for Online Decision Making

Arxiv

0+阅读 · 2022年11月8日

Reduced order modeling with time-dependent bases for PDEs with stochastic boundary conditions

Arxiv

0+阅读 · 2022年11月7日

Uncertainty-aware predictive modeling for fair data-driven decisions

Arxiv

0+阅读 · 2022年11月4日

The Online Knapsack Problem with Departures

Arxiv

0+阅读 · 2022年11月4日

Applications of transcendental number theory to decision problems for hypergeometric sequences

Arxiv

0+阅读 · 2022年11月4日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

VIP会员

文章信息

相关主题

dynamic programming

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Leveraging Offline Data in Online Reinforcement Learning

Arxiv

0+阅读 · 2022年11月9日

Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration

Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration

Arxiv

0+阅读 · 2022年11月9日

A framework for online, stabilizing reinforcement learning

Arxiv

0+阅读 · 2022年11月8日

A Simple Algorithm for Online Decision Making

Arxiv

0+阅读 · 2022年11月8日

Reduced order modeling with time-dependent bases for PDEs with stochastic boundary conditions

Arxiv

0+阅读 · 2022年11月7日

Uncertainty-aware predictive modeling for fair data-driven decisions

Arxiv

0+阅读 · 2022年11月4日

The Online Knapsack Problem with Departures

Arxiv

0+阅读 · 2022年11月4日

Applications of transcendental number theory to decision problems for hypergeometric sequences

Arxiv

0+阅读 · 2022年11月4日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

相关基金

Progerin/PrelaminA诱发早老症的蛋白质组学研究

国家自然科学基金

1+阅读 · 2015年12月31日

Decorin对急性缺血性卒中后血脑屏障中ZO-1蛋白的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

炎症因子通过Rictor调控肾癌转移的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

城市地铁用LiB-EDLC混合储能系统功率/能量匹配特性及协同控制策略

国家自然科学基金

0+阅读 · 2014年12月31日

NLRP3炎症小体在无烟气烟草致肝损伤中作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

雷帕霉素复合物1在巨噬细胞炎症反应中的作用与机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

力环境下染色质重塑复合物SWI/SNF对基因可变剪接的调控机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

MFAP-1调控信使RNA剪接过程的分子遗传研究

国家自然科学基金

0+阅读 · 2009年12月31日

红笛鲷仔鱼抗菌免疫基因的克隆及表达时序研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于量子点发光探针研究抗逆植物激素细胞信号转导

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员