强化学习中差异-依赖点差异 -- -- 强化学习中的弹道:在存储和确定性环境中两个世界中最好的 (Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments) - 专知论文

会员服务 ·

0

回合 · ENJOY · 方差 · Learning · Minimax ·

2023 年 1 月 31 日

Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments

翻译：强化学习中差异-依赖点差异 -- -- 强化学习中的弹道:在存储和确定性环境中两个世界中最好的

Runlong Zhou,Zihan Zhang,Simon S. Du

from arxiv, 43 pages, 1 figure

We study variance-dependent regret bounds for Markov decision processes (MDPs). Algorithms with variance-dependent regret guarantees can automatically exploit environments with low variance (e.g., enjoying constant regret on deterministic MDPs). The existing algorithms are either variance-independent or suboptimal. We first propose two new environment norms to characterize the fine-grained variance properties of the environment. For model-based methods, we design a variant of the MVP algorithm (Zhang et al., 2021a) and use new analysis techniques show to this algorithm enjoys variance-dependent bounds with respect to our proposed norms. In particular, this bound is simultaneously minimax optimal for both stochastic and deterministic MDPs, the first result of its kind. We further initiate the study on model-free algorithms with variance-dependent regret bounds by designing a reference-function-based algorithm with a novel capped-doubling reference update schedule. Lastly, we also provide lower bounds to complement our upper bounds.

翻译：我们研究了Markov决策程序(MDPs)的因差异而异的遗憾界限;有因差异而异的遗憾保证的等级可以自动地利用差异较低的环境(例如,对确定性MDPs不断表示遗憾);现有的算法要么是因差异而异的,要么是不最优的;我们首先提出了两个新的环境规范,以说明细微差异的环境特性;对于基于模型的方法,我们设计了一个基于参考功能的算法的变种(Zhang等人,2021aa),并使用新的分析技术来显示这种算法在我们的拟议规范方面具有因差异而异的界限;特别是,这一界限同时对随机和确定性 MDPs这两个系统都具有最优化的最小值;我们进一步发起关于无模式的、因差异而异的算法的研究,方法是设计一种基于参考功能的变式算法,并配有新型的按上限调整的参考更新时间表。最后,我们还提供了更低的界限,以补充我们的上层。

0

相关内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

污泥固体厌氧发酵反应器内甲烷累积的微生态机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

转录因子BmPOU和BmAbd-A对家蚕变态发育的调控机理

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

屏蔽数据系统的可靠性评估研究

国家自然科学基金

0+阅读 · 2013年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于分形原理的阵列静电纺丝机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

基于MCMC算法的非线性贝叶斯估计方法及其应用

国家自然科学基金

1+阅读 · 2011年12月31日

香猪源性降胆固醇、耐氧的双歧杆菌筛选及高密度发酵

国家自然科学基金

0+阅读 · 2011年12月31日

Lower Bounds on the Bayesian Risk via Information Measures

Arxiv

0+阅读 · 2023年3月24日

Utilising the CLT Structure in Stochastic Gradient based Sampling : Improved Analysis and Faster Algorithms

Arxiv

0+阅读 · 2023年3月23日

Improved Regret Bounds for Online Kernel Selection under Bandit Feedback

Arxiv

0+阅读 · 2023年3月23日

Deep Reinforcement Learning for Localizability-Enhanced Navigation in Dynamic Human Environments

Arxiv

0+阅读 · 2023年3月22日

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm

Arxiv

0+阅读 · 2023年3月21日

Uniform Risk Bounds for Learning with Dependent Data Sequences

Arxiv

0+阅读 · 2023年3月21日

Assessor-Guided Learning for Continual Environments

Arxiv

0+阅读 · 2023年3月21日

Risk-Sensitive Reinforcement Learning with Exponential Criteria

Arxiv

0+阅读 · 2023年3月21日

Policy Mirror Descent Inherently Explores Action Space

Arxiv

0+阅读 · 2023年3月21日

The Role of Heterogeneity in Autonomous Perimeter Defense Problems

The Role of Heterogeneity in Autonomous Perimeter Defense Problems

Arxiv

13+阅读 · 2022年2月21日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Lower Bounds on the Bayesian Risk via Information Measures

Arxiv

0+阅读 · 2023年3月24日

Utilising the CLT Structure in Stochastic Gradient based Sampling : Improved Analysis and Faster Algorithms

Arxiv

0+阅读 · 2023年3月23日

Improved Regret Bounds for Online Kernel Selection under Bandit Feedback

Arxiv

0+阅读 · 2023年3月23日

Deep Reinforcement Learning for Localizability-Enhanced Navigation in Dynamic Human Environments

Arxiv

0+阅读 · 2023年3月22日

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm

Arxiv

0+阅读 · 2023年3月21日

Uniform Risk Bounds for Learning with Dependent Data Sequences

Arxiv

0+阅读 · 2023年3月21日

Assessor-Guided Learning for Continual Environments

Arxiv

0+阅读 · 2023年3月21日

Risk-Sensitive Reinforcement Learning with Exponential Criteria

Arxiv

0+阅读 · 2023年3月21日

Policy Mirror Descent Inherently Explores Action Space

Arxiv

0+阅读 · 2023年3月21日

The Role of Heterogeneity in Autonomous Perimeter Defense Problems

The Role of Heterogeneity in Autonomous Perimeter Defense Problems

Arxiv

13+阅读 · 2022年2月21日

相关基金

污泥固体厌氧发酵反应器内甲烷累积的微生态机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

转录因子BmPOU和BmAbd-A对家蚕变态发育的调控机理

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

屏蔽数据系统的可靠性评估研究

国家自然科学基金

0+阅读 · 2013年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于分形原理的阵列静电纺丝机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

基于MCMC算法的非线性贝叶斯估计方法及其应用

国家自然科学基金

1+阅读 · 2011年12月31日

香猪源性降胆固醇、耐氧的双歧杆菌筛选及高密度发酵

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员