以普通函数近似法重新审视离线 RL 的线性方案框架 (Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation) - 专知论文

会员服务 ·

0

泛函 · 广义函数 · 近似 · 优化器 · Minimax ·

2023 年 2 月 8 日

Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation

翻译：以普通函数近似法重新审视离线 RL 的线性方案框架

Asuman Ozdaglar,Sarath Pattathil,Jiawei Zhang,Kaiqing Zhang

from arxiv, 35 pages

Offline reinforcement learning (RL) aims to find an optimal policy for sequential decision-making using a pre-collected dataset, without further interaction with the environment. Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators, especially to handle the case with excessively large state-action spaces. Among them, the framework based on the linear-programming (LP) reformulation of Markov decision processes has shown promise: it enables sample-efficient offline RL with function approximation, under only partial data coverage and realizability assumptions on the function classes, with favorable computational tractability. In this work, we revisit the LP framework for offline RL, and provide a new reformulation that advances the existing results in several aspects, relaxing certain assumptions and achieving optimal statistical rates in terms of sample size. Our key enabler is to introduce proper constraints in the reformulation, instead of using any regularization as in the literature, also with careful choices of the function classes and initial state distributions. We hope our insights bring into light the use of LP formulations and the induced primal-dual minimax optimization, in offline RL.

翻译：离线强化学习(RL)的目的是在不与环境进一步互动的情况下,利用预先收集的数据集,找到最佳的政策,进行顺序决策,同时不与环境进一步互动。最近的理论进展侧重于开发抽样高效的离线RL算法,对数据覆盖范围和功能对应器有各种宽松的假设,特别是处理州一级行动空间过大的案件。其中,基于线性方案(LP)重新拟订Markov决策程序的框架显示了希望:它只允许在部分数据覆盖范围和功能类别实际假设下,以功能近似方式进行抽样高效的离线RL(RL)切换,但只允许部分数据覆盖面和功能类别真实性假设,且具有有利的计算牵引力。在这项工作中,我们重新审视了离线RLL(L)框架,并提供新的调整,在几个方面推进现有成果,放松某些假设,并在样本大小方面实现最佳统计率。我们的主要推动因素是在重新制定时引入适当的限制,而不是在文献中使用任何正规化,同时仔细选择功能类别和初始状态分布。我们希望我们的洞察到能够点显示LP(LP)的配制成和原始-L(A-L)微轴)。

0

相关内容

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

专知会员服务

40+阅读 · 2022年10月10日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

103+阅读 · 2020年6月21日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于IRT的认知诊断计量方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于几何划分和层次结构模型的高分辨率遥感影像分割方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

Maxwell 方程组自适应 PML 高阶棱元离散系统的快速算法

国家自然科学基金

0+阅读 · 2012年12月31日

稀土掺杂对Co基Heusler合金磁性和费米能级的调控

国家自然科学基金

0+阅读 · 2011年12月31日

重调和方程基于Poisson算子的高效有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

Provably Efficient Causal Model-Based Reinforcement Learning for Systematic Generalization

Arxiv

0+阅读 · 2023年3月30日

A Method for Emerging Empirical Age Structures in Agent-Based Models with Exogenous Survival Probabilities

Arxiv

0+阅读 · 2023年3月30日

KOO approach for scalable variable selection problem in large-dimensional regression

Arxiv

0+阅读 · 2023年3月30日

Probabilistic inverse optimal control with local linearization for non-linear partially observable systems

Arxiv

0+阅读 · 2023年3月29日

Model Order Reduction for Deforming Domain Problems in a Time-Continuous Space-Time Setting

Arxiv

0+阅读 · 2023年3月29日

An Elementary Proof of the First LP Bound on the Rate of Binary Codes

Arxiv

0+阅读 · 2023年3月29日

A Primal-dual Approach for Solving Variational Inequalities with General-form Constraints

Arxiv

0+阅读 · 2023年3月29日

Learning Excavation of Rigid Objects with Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年3月29日

ProductAE: Toward Deep Learning Driven Error-Correction Codes of Large Dimensions

Arxiv

0+阅读 · 2023年3月29日

Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

Arxiv

0+阅读 · 2023年3月28日

VIP会员

文章信息

相关主题

相关VIP内容

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

专知会员服务

40+阅读 · 2022年10月10日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

103+阅读 · 2020年6月21日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Provably Efficient Causal Model-Based Reinforcement Learning for Systematic Generalization

Arxiv

0+阅读 · 2023年3月30日

A Method for Emerging Empirical Age Structures in Agent-Based Models with Exogenous Survival Probabilities

Arxiv

0+阅读 · 2023年3月30日

KOO approach for scalable variable selection problem in large-dimensional regression

Arxiv

0+阅读 · 2023年3月30日

Probabilistic inverse optimal control with local linearization for non-linear partially observable systems

Arxiv

0+阅读 · 2023年3月29日

Model Order Reduction for Deforming Domain Problems in a Time-Continuous Space-Time Setting

Arxiv

0+阅读 · 2023年3月29日

An Elementary Proof of the First LP Bound on the Rate of Binary Codes

Arxiv

0+阅读 · 2023年3月29日

A Primal-dual Approach for Solving Variational Inequalities with General-form Constraints

Arxiv

0+阅读 · 2023年3月29日

Learning Excavation of Rigid Objects with Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年3月29日

ProductAE: Toward Deep Learning Driven Error-Correction Codes of Large Dimensions

Arxiv

0+阅读 · 2023年3月29日

Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

Arxiv

0+阅读 · 2023年3月28日

相关基金

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于IRT的认知诊断计量方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于几何划分和层次结构模型的高分辨率遥感影像分割方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

Maxwell 方程组自适应 PML 高阶棱元离散系统的快速算法

国家自然科学基金

0+阅读 · 2012年12月31日

稀土掺杂对Co基Heusler合金磁性和费米能级的调控

国家自然科学基金

0+阅读 · 2011年12月31日

重调和方程基于Poisson算子的高效有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员