学习具有线性函数近似度的反反向 MDP 几乎最优化 Regret (Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation) - 专知论文

会员服务 ·

0

优化器 · 线性的 · 泛函 · 近似 · Processing（编程语言） ·

2021 年 2 月 17 日

Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation

翻译：学习具有线性函数近似度的反反向 MDP 几乎最优化 Regret

Jiafan He,Dongruo Zhou,Quanquan Gu

from arxiv, 28 pages

We study the reinforcement learning for finite-horizon episodic Markov decision processes with adversarial reward and full information feedback, where the unknown transition probability function is a linear function of a given feature mapping. We propose an optimistic policy optimization algorithm with Bernstein bonus and show that it can achieve $\tilde{O}(dH\sqrt{T})$ regret, where $H$ is the length of the episode, $T$ is the number of interaction with the MDP and $d$ is the dimension of the feature mapping. Furthermore, we also prove a matching lower bound of $\tilde{\Omega}(dH\sqrt{T})$ up to logarithmic factors. To the best of our knowledge, this is the first computationally efficient, nearly minimax optimal algorithm for adversarial Markov decision processes with linear function approximation.

翻译：我们用对抗性奖赏和充分的信息反馈来研究关于有限和偏差的Spidic Markov 决策程序的强化学习,其中未知的过渡概率函数是某个特性绘图的线性函数。我们建议用伯恩斯坦奖金来提出一个乐观的政策优化算法,并表明它能够达到$\tilde{O}(dH\sqrt{T})美元(dH\sqrt{T})的遗憾,其中H$是插曲的长度,$T$是同MDP的互动次数,$d$是特征绘图的维度。此外,我们还证明匹配的比值较低,为$\tilde_Omega}(dH\sqrt{T}),最高达对数系数。据我们所知,这是第一个计算高效的、接近线性函数的对抗性Markov 决策过程的近似小型最佳算法。

0

相关内容

优化器

近期必读的六篇AAAI 2021【对抗攻击（Adversarial Attack）】相关论文和代码

专知会员服务

55+阅读 · 2021年2月17日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【新书】R语言统计学习，R for Statistical Learning，301页pdf

专知会员服务

30+阅读 · 2020年11月4日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

【论文】量子对抗机器学习，Quantum Adversarial Machine Learning

【论文】量子对抗机器学习，Quantum Adversarial Machine Learning

专知会员服务

38+阅读 · 2020年1月5日

【论文】欺骗学习（Learning by Cheating）

【论文】欺骗学习（Learning by Cheating）

专知会员服务

28+阅读 · 2020年1月3日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

已删除

将门创投

5+阅读 · 2018年10月16日

Optimal Approximation Rate of ReLU Networks in terms of Width and Depth

Arxiv

0+阅读 · 2021年4月9日

Learning Sampling Policy for Faster Derivative Free Optimization

Learning Sampling Policy for Faster Derivative Free Optimization

Arxiv

0+阅读 · 2021年4月9日

A MCMC-type simple probabilistic approach for determining optimal progressive censoring schemes

Arxiv

0+阅读 · 2021年4月8日

Graph Partitioning and Sparse Matrix Ordering using Reinforcement Learning

Arxiv

0+阅读 · 2021年4月8日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

近期必读的六篇AAAI 2021【对抗攻击（Adversarial Attack）】相关论文和代码

专知会员服务

55+阅读 · 2021年2月17日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【新书】R语言统计学习，R for Statistical Learning，301页pdf

专知会员服务

30+阅读 · 2020年11月4日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

【论文】量子对抗机器学习，Quantum Adversarial Machine Learning

【论文】量子对抗机器学习，Quantum Adversarial Machine Learning

专知会员服务

38+阅读 · 2020年1月5日

【论文】欺骗学习（Learning by Cheating）

【论文】欺骗学习（Learning by Cheating）

专知会员服务

28+阅读 · 2020年1月3日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

已删除

将门创投

5+阅读 · 2018年10月16日

相关论文

Optimal Approximation Rate of ReLU Networks in terms of Width and Depth

Arxiv

0+阅读 · 2021年4月9日

Learning Sampling Policy for Faster Derivative Free Optimization

Learning Sampling Policy for Faster Derivative Free Optimization

Arxiv

0+阅读 · 2021年4月9日

A MCMC-type simple probabilistic approach for determining optimal progressive censoring schemes

Arxiv

0+阅读 · 2021年4月8日

Graph Partitioning and Sparse Matrix Ordering using Reinforcement Learning

Arxiv

0+阅读 · 2021年4月8日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员