Online Sub-Sampling for Reinforcement Learning with General Function Approximation (Online Sub-Sampling for Reinforcement Learning with General Function Approximation) - 专知论文

会员服务 ·

0

广义函数 · 在线 · 泛函 · 近似 · 强化学习 ·

2023 年 4 月 18 日

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

翻译：Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Dingwen Kong,Ruslan Salakhutdinov,Ruosong Wang,Lin F. Yang

Most of the existing works for reinforcement learning (RL) with general function approximation (FA) focus on understanding the statistical complexity or regret bounds. However, the computation complexity of such approaches is far from being understood -- indeed, a simple optimization problem over the function class might be as well intractable. In this paper, we tackle this problem by establishing an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm and uses the measurement to guide exploration. For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $\propto\operatorname{poly}\log(K)$ times for running the RL algorithm for $K$ episodes while still achieving a small near-optimal regret bound. In contrast to existing approaches that update the policy for at least $\Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy. When applied to settings in \cite{wang2020reinforcement} or \cite{jin2021bellman}, we improve the overall time complexity by at least a factor of $K$. Finally, we show the generality of our online sub-sampling technique by applying it to the reward-free RL setting and multi-agent RL setting.

翻译：大多数现有的普适函数逼近强化学习（RL）的作品侧重于理解统计复杂性或后悔界限。然而，这种方法的计算复杂度远未被理解，实际上，在函数类中进行的简单优化问题可能同样难以处理。本文通过建立一种有效的在线子抽样框架来解决这个问题，该框架可以测量由RL算法收集的数据点的信息增益，并使用该测量值来指导探索。对于具有复杂度有界函数类的基于价值的方法，我们证明策略只需要更新$ \propto\operatorname {poly}\log(K) $次，在为$K$个剧集运行RL算法时仍然能够实现小的接近最优后悔界限。与现有方法相比，后者需要至少更新$ \Omega(K) $次策略，我们的方法大大减少了求解策略的优化调用次数。当应用于\cite{wang2020reinforcement}或\cite{jin2021bellman}设置时，我们将总体时间复杂度提高了至少$K$倍。最后，我们展示了我们的在线子抽样技术的通用性，将其应用于无奖励RL设置和多智能体RL设置。

0

相关内容

广义函数

斯坦福最新《强化学习》2023课程，Emma Brunskill主讲，附PPT下载

斯坦福最新《强化学习》2023课程，Emma Brunskill主讲，附PPT下载

专知会员服务

45+阅读 · 2023年1月17日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

102+阅读 · 2020年6月21日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

专知会员服务

52+阅读 · 2020年4月7日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RL解决'LunarLander-v2' (SOTA)

RL解决'LunarLander-v2' (SOTA)

CreateAMind

62+阅读 · 2019年9月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

计及多重-复合不确定性的电力系统稳定约束优化调度研究

国家自然科学基金

1+阅读 · 2016年12月31日

图论中的整数流与圆流

国家自然科学基金

0+阅读 · 2015年12月31日

复多项式的核拓扑熵

国家自然科学基金

0+阅读 · 2015年12月31日

最优控制的快速算法

国家自然科学基金

0+阅读 · 2014年12月31日

暖白光LED用低光衰高显色性Lu3Al5-x(Si/B)xO12-yNy:Ce荧光粉的研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于3值抽象的假设-保证式PCTL*组合随机模型检验方法

国家自然科学基金

0+阅读 · 2013年12月31日

随机矩阵理论中Beta系综的特征多项式

国家自然科学基金

0+阅读 · 2013年12月31日

高维数据的假设检验

国家自然科学基金

0+阅读 · 2012年12月31日

数量性状基因定位分析中随机模型方差组分的回归解法

国家自然科学基金

0+阅读 · 2011年12月31日

随机变分不等式

国家自然科学基金

0+阅读 · 2011年12月31日

Decentralized Online Regularized Learning Over Random Time-Varying Graphs

Arxiv

0+阅读 · 2023年6月2日

Policy Optimization for Continuous Reinforcement Learning

Arxiv

0+阅读 · 2023年6月2日

Non-stationary Reinforcement Learning under General Function Approximation

Arxiv

0+阅读 · 2023年6月1日

Identifiability and Generalizability in Constrained Inverse Reinforcement Learning

Arxiv

2+阅读 · 2023年6月1日

Safe Offline Reinforcement Learning with Real-Time Budget Constraints

Arxiv

0+阅读 · 2023年6月1日

What can online reinforcement learning with function approximation benefit from general coverage conditions?

Arxiv

0+阅读 · 2023年5月31日

Efficient Online Reinforcement Learning with Offline Data

Arxiv

0+阅读 · 2023年5月31日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

相关VIP内容

斯坦福最新《强化学习》2023课程，Emma Brunskill主讲，附PPT下载

斯坦福最新《强化学习》2023课程，Emma Brunskill主讲，附PPT下载

专知会员服务

45+阅读 · 2023年1月17日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

102+阅读 · 2020年6月21日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

专知会员服务

52+阅读 · 2020年4月7日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

数据驱动死亡：以色列AI战争机器如何锁定目标

【普林斯顿博士论文】通过以人为本的评估推动负责任的人工智能

ICML 2025 | BiAssemble: 双臂机器人几何拼合问题的协同可供性学习

ICML 2025杰出论文出炉：8篇获奖，南大研究者榜上有名

相关资讯

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RL解决'LunarLander-v2' (SOTA)

RL解决'LunarLander-v2' (SOTA)

CreateAMind

62+阅读 · 2019年9月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

相关论文

Decentralized Online Regularized Learning Over Random Time-Varying Graphs

Arxiv

0+阅读 · 2023年6月2日

Policy Optimization for Continuous Reinforcement Learning

Arxiv

0+阅读 · 2023年6月2日

Non-stationary Reinforcement Learning under General Function Approximation

Arxiv

0+阅读 · 2023年6月1日

Identifiability and Generalizability in Constrained Inverse Reinforcement Learning

Arxiv

2+阅读 · 2023年6月1日

Safe Offline Reinforcement Learning with Real-Time Budget Constraints

Arxiv

0+阅读 · 2023年6月1日

What can online reinforcement learning with function approximation benefit from general coverage conditions?

Arxiv

0+阅读 · 2023年5月31日

Efficient Online Reinforcement Learning with Offline Data

Arxiv

0+阅读 · 2023年5月31日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

计及多重-复合不确定性的电力系统稳定约束优化调度研究

国家自然科学基金

1+阅读 · 2016年12月31日

图论中的整数流与圆流

国家自然科学基金

0+阅读 · 2015年12月31日

复多项式的核拓扑熵

国家自然科学基金

0+阅读 · 2015年12月31日

最优控制的快速算法

国家自然科学基金

0+阅读 · 2014年12月31日

暖白光LED用低光衰高显色性Lu3Al5-x(Si/B)xO12-yNy:Ce荧光粉的研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于3值抽象的假设-保证式PCTL*组合随机模型检验方法

国家自然科学基金

0+阅读 · 2013年12月31日

随机矩阵理论中Beta系综的特征多项式

国家自然科学基金

0+阅读 · 2013年12月31日

高维数据的假设检验

国家自然科学基金

0+阅读 · 2012年12月31日

数量性状基因定位分析中随机模型方差组分的回归解法

国家自然科学基金

0+阅读 · 2011年12月31日

随机变分不等式

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员