学习反线性线性混成 mixture MDP 的近最佳政策优化比值 (Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs) - 专知论文

会员服务 ·

0

优化器 · 泛函 · 线性的 · 学成 · 转移核 ·

2022 年 4 月 20 日

Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs

翻译：学习反线性线性混成 mixture MDP 的近最佳政策优化比值

Jiafan He,Dongruo Zhou,Quanquan Gu

from arxiv, 22 pages, 1 figure. In AISTATS 2022

Learning Markov decision processes (MDPs) in the presence of the adversary is a challenging problem in reinforcement learning (RL). In this paper, we study RL in episodic MDPs with adversarial reward and full information feedback, where the unknown transition probability function is a linear function of a given feature mapping, and the reward function can change arbitrarily episode by episode. We propose an optimistic policy optimization algorithm POWERS and show that it can achieve $\tilde{O}(dH\sqrt{T})$ regret, where $H$ is the length of the episode, $T$ is the number of interactions with the MDP, and $d$ is the dimension of the feature mapping. Furthermore, we also prove a matching lower bound of $\tilde{\Omega}(dH\sqrt{T})$ up to logarithmic factors. Our key technical contributions are two-fold: (1) a new value function estimator based on importance weighting; and (2) a tighter confidence set for the transition kernel. They together lead to the nearly minimax optimal regret.

翻译：在对手在场的情况下,学习Markov(MDPs)决策程序(MDPs)是强化学习(RL)的一个棘手问题。在本文中,我们用对抗性奖赏和完整信息反馈对附带的MDPs研究RL,其中未知的过渡概率函数是特定地貌映射的线性功能,而奖励函数可以随插曲而任意改变。我们提出了一个乐观的政策优化算法POWERS,并显示它能够实现$\tilde{O}(dH\sqrt{T})的遗憾,其中H$是事件长度,$T$是与MDP的互动次数,$d$是特征映射的维度。此外,我们还证明将 $\ tilde_Omega}(dH\sqrt{T}) 的值匹配到对数因素的相对较低约束。我们的主要技术贡献有两重:(1) 基于重量的新的价值函数估计;和(2)对过渡内核的更紧密的置信任度。它们一起导致近为最小的最佳遗憾。

0

相关内容

优化器

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Actinophyllic Acid类含七元环的复杂多环活性天然产物全合成研究

国家自然科学基金

0+阅读 · 2014年12月31日

原位自组装金属有机骨架纳米杂化多层膜的基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

多智能体系统分布式最优化问题

国家自然科学基金

9+阅读 · 2012年12月31日

受限制策略下多臂Bandit过程的理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

应急物流中的车辆路径优化问题

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

癫痫脑内突触连接异常的肌动蛋白骨架重构机制

国家自然科学基金

0+阅读 · 2011年12月31日

阻变存储器工作机理的原位透射电镜研究

国家自然科学基金

0+阅读 · 2011年12月31日

超分子模板方法设计与合成微-介孔多级孔道金属-有机骨架材料及其催化反应动力学研究

国家自然科学基金

0+阅读 · 2009年12月31日

Regret Bounds for Information-Directed Reinforcement Learning

Arxiv

0+阅读 · 2022年6月9日

Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

Arxiv

0+阅读 · 2022年6月9日

Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks

Arxiv

0+阅读 · 2022年6月9日

Model-Free $μ$ Synthesis via Adversarial Reinforcement Learning

Arxiv

0+阅读 · 2022年6月8日

Jackknife Partially Linear Model Averaging for the Conditional Quantile Prediction

Arxiv

0+阅读 · 2022年6月7日

A Simple and Optimal Policy Design for Online Learning with Safety against Heavy-tailed Risk

Arxiv

0+阅读 · 2022年6月7日

Adversarial Bandits Robust to $S$-Switch Regret

Arxiv

0+阅读 · 2022年6月6日

Collaborative Linear Bandits with Adversarial Agents: Near-Optimal Regret Bounds

Arxiv

0+阅读 · 2022年6月6日

Multi-Agent Cooperative Bidding Games for Multi-Objective Optimization in e-Commercial Sponsored Search

Arxiv

12+阅读 · 2021年6月8日

Composite Adversarial Attacks

Arxiv

12+阅读 · 2020年12月10日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Regret Bounds for Information-Directed Reinforcement Learning

Arxiv

0+阅读 · 2022年6月9日

Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

Arxiv

0+阅读 · 2022年6月9日

Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks

Arxiv

0+阅读 · 2022年6月9日

Model-Free $μ$ Synthesis via Adversarial Reinforcement Learning

Arxiv

0+阅读 · 2022年6月8日

Jackknife Partially Linear Model Averaging for the Conditional Quantile Prediction

Arxiv

0+阅读 · 2022年6月7日

A Simple and Optimal Policy Design for Online Learning with Safety against Heavy-tailed Risk

Arxiv

0+阅读 · 2022年6月7日

Adversarial Bandits Robust to $S$-Switch Regret

Arxiv

0+阅读 · 2022年6月6日

Collaborative Linear Bandits with Adversarial Agents: Near-Optimal Regret Bounds

Arxiv

0+阅读 · 2022年6月6日

Multi-Agent Cooperative Bidding Games for Multi-Objective Optimization in e-Commercial Sponsored Search

Arxiv

12+阅读 · 2021年6月8日

Composite Adversarial Attacks

Arxiv

12+阅读 · 2020年12月10日

相关基金

Actinophyllic Acid类含七元环的复杂多环活性天然产物全合成研究

国家自然科学基金

0+阅读 · 2014年12月31日

原位自组装金属有机骨架纳米杂化多层膜的基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

多智能体系统分布式最优化问题

国家自然科学基金

9+阅读 · 2012年12月31日

受限制策略下多臂Bandit过程的理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

应急物流中的车辆路径优化问题

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

癫痫脑内突触连接异常的肌动蛋白骨架重构机制

国家自然科学基金

0+阅读 · 2011年12月31日

阻变存储器工作机理的原位透射电镜研究

国家自然科学基金

0+阅读 · 2011年12月31日

超分子模板方法设计与合成微-介孔多级孔道金属-有机骨架材料及其催化反应动力学研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员