使用线性函数近似度学习无线-平偏平平均回报 MDP (Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation) - 专知论文

会员服务 ·

0

线性的 · 近似 · 泛函 · 优化器 · 学成 ·

2021 年 4 月 26 日

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

翻译：使用线性函数近似度学习无线-平偏平平均回报 MDP

Chen-Yu Wei,Mehdi Jafarnia-Jahromi,Haipeng Luo,Rahul Jain

We develop several new algorithms for learning Markov Decision Processes in an infinite-horizon average-reward setting with linear function approximation. Using the optimism principle and assuming that the MDP has a linear structure, we first propose a computationally inefficient algorithm with optimal $\widetilde{O}(\sqrt{T})$ regret and another computationally efficient variant with $\widetilde{O}(T^{3/4})$ regret, where $T$ is the number of interactions. Next, taking inspiration from adversarial linear bandits, we develop yet another efficient algorithm with $\widetilde{O}(\sqrt{T})$ regret under a different set of assumptions, improving the best existing result by Hao et al. (2020) with $\widetilde{O}(T^{2/3})$ regret. Moreover, we draw a connection between this algorithm and the Natural Policy Gradient algorithm proposed by Kakade (2002), and show that our analysis improves the sample complexity bound recently given by Agarwal et al. (2020).

翻译：我们开发了几种新的算法,以在无限的一等正负平均回报环境中学习Markov 决策进程, 以线性函数近似值。我们使用乐观原则, 假设 MDP 具有线性结构, 我们首先提出一种计算效率低的算法, 以最佳的$\ 全局性{O}( sqrt{T}) 表示遗憾, 并以$\ 全局性{O} (T ⁇ 3/4}) 表示遗憾, 并用另一种计算效率低的变方法, 以最优的$\ 全局性{O} (Sqrt{T}) 表示歉意。此外, 我们把这一算法和Kakade (2002年) 提议的自然政策梯级算法联系起来, 并表明我们的分析提高了Agarwal 等人( 202020) 最近提供的样本复杂性。

0

相关内容

线性的

如何构建你的推荐系统？这份21页ppt教程为你讲解

如何构建你的推荐系统？这份21页ppt教程为你讲解

专知会员服务

65+阅读 · 2021年2月12日

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

专知会员服务

41+阅读 · 2020年7月23日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【综述】图像去噪的深度学习:综述，36页pdf，Deep Learning on Image Denoising: An overview

【综述】图像去噪的深度学习:综述，36页pdf，Deep Learning on Image Denoising: An overview

专知会员服务

71+阅读 · 2019年12月31日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI官方发布：强化学习中的关键论文

OpenAI官方发布：强化学习中的关键论文

专知

14+阅读 · 2018年12月12日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

人体姿态估计资源大列表（Human Pose Estimation）

人体姿态估计资源大列表（Human Pose Estimation）

专知

9+阅读 · 2018年10月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

Arxiv

0+阅读 · 2021年6月15日

Improved Regret Bounds for Online Submodular Maximization

Arxiv

0+阅读 · 2021年6月15日

Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation

Arxiv

0+阅读 · 2021年6月14日

On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

Arxiv

0+阅读 · 2021年6月14日

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Arxiv

0+阅读 · 2021年6月14日

A Minimalist Approach to Offline Reinforcement Learning

Arxiv

0+阅读 · 2021年6月12日

Online learning in MDPs with linear function approximation and bandit feedback

Arxiv

0+阅读 · 2021年6月12日

Safe Reinforcement Learning with Linear Function Approximation

Arxiv

0+阅读 · 2021年6月11日

Represent Your Own Policies: Reinforcement Learning with Policy-extended Value Function Approximator

Arxiv

0+阅读 · 2021年6月11日

Randomized Exploration is Near-Optimal for Tabular MDP

Arxiv

0+阅读 · 2021年6月3日

VIP会员

文章信息

相关主题

相关VIP内容

如何构建你的推荐系统？这份21页ppt教程为你讲解

如何构建你的推荐系统？这份21页ppt教程为你讲解

专知会员服务

65+阅读 · 2021年2月12日

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

专知会员服务

41+阅读 · 2020年7月23日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【综述】图像去噪的深度学习:综述，36页pdf，Deep Learning on Image Denoising: An overview

【综述】图像去噪的深度学习:综述，36页pdf，Deep Learning on Image Denoising: An overview

专知会员服务

71+阅读 · 2019年12月31日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI官方发布：强化学习中的关键论文

OpenAI官方发布：强化学习中的关键论文

专知

14+阅读 · 2018年12月12日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

人体姿态估计资源大列表（Human Pose Estimation）

人体姿态估计资源大列表（Human Pose Estimation）

专知

9+阅读 · 2018年10月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

Arxiv

0+阅读 · 2021年6月15日

Improved Regret Bounds for Online Submodular Maximization

Arxiv

0+阅读 · 2021年6月15日

Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation

Arxiv

0+阅读 · 2021年6月14日

On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

Arxiv

0+阅读 · 2021年6月14日

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Arxiv

0+阅读 · 2021年6月14日

A Minimalist Approach to Offline Reinforcement Learning

Arxiv

0+阅读 · 2021年6月12日

Online learning in MDPs with linear function approximation and bandit feedback

Arxiv

0+阅读 · 2021年6月12日

Safe Reinforcement Learning with Linear Function Approximation

Arxiv

0+阅读 · 2021年6月11日

Represent Your Own Policies: Reinforcement Learning with Policy-extended Value Function Approximator

Arxiv

0+阅读 · 2021年6月11日

Randomized Exploration is Near-Optimal for Tabular MDP

Arxiv

0+阅读 · 2021年6月3日

微信扫码咨询专知VIP会员