改进最差情况对随机化最低平方数值迭代的 (Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration) - 专知论文

会员服务 ·

0

值迭代 · 价值函数 · state-of-the-art · 泛函 · 样本 ·

2020 年 10 月 23 日

Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration

翻译：改进最差情况对随机化最低平方数值迭代的

Priyank Agrawal,Jinglin Chen,Nan Jiang

from arxiv, 36 pages

This paper studies regret minimization with randomized value functions in reinforcement learning. In tabular finite-horizon Markov Decision Processes, we introduce a clipping variant of one classical Thompson Sampling (TS)-like algorithm, randomized least-squares value iteration (RLSVI). We analyze the algorithm using a novel intertwined regret decomposition. Our $\tilde{\mathrm{O}}(H^2S\sqrt{AT})$ high-probability worst-case regret bound improves the previous sharpest worst-case regret bounds for RLSVI and matches the existing state-of-the-art worst-case TS-based regret bounds.

翻译：本文的论文研究对在强化学习中以随机值函数来最小化感到遗憾。在表格中, 限值 Markov 决策程序, 我们引入了一种类似 Thompson 经典的随机最小方位值迭代算法( RLSVI ) 的剪切变式。我们使用新颖的相互交织的遗憾分解法来分析算法。我们的 $\ tilde\ mathrm{O}( H2S\ sqrt{AT}) $( ) 高概率最坏的负数分解框改进了 RLSVI 以前的最难处理的最难处理的后处理框, 并符合现有最先进的基于 TS 最坏案例的后处理框。

0

相关内容

值迭代

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

最新《高级算法》Advanced Algorithms，176页pdf

最新《高级算法》Advanced Algorithms，176页pdf

专知会员服务

92+阅读 · 2020年10月22日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【开放书-纽约大学】面向数据科学的概率与统计，237页pdf

【开放书-纽约大学】面向数据科学的概率与统计，237页pdf

专知会员服务

149+阅读 · 2020年7月6日

Python计算导论，560页pdf，Introduction to Computing Using Python

Python计算导论，560页pdf，Introduction to Computing Using Python

专知会员服务

75+阅读 · 2020年5月5日

【经典书】算法设计与分析，727页pdf，Algorithms Design and Analysis，牛津大学出版社

【经典书】算法设计与分析，727页pdf，Algorithms Design and Analysis，牛津大学出版社

专知会员服务

134+阅读 · 2020年2月25日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

已删除

将门创投

4+阅读 · 2018年11月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Optimal Non-Adaptive Probabilistic Group Testing Requires $Θ(\min\{k \log n, n\})$ Tests

Arxiv

0+阅读 · 2020年12月11日

A partial least squares approach for function-on-function interaction regression

Arxiv

0+阅读 · 2020年12月9日

Streaming Algorithms for Stochastic Multi-armed Bandits

Streaming Algorithms for Stochastic Multi-armed Bandits

Arxiv

0+阅读 · 2020年12月9日

Optimal Variance Control of the Score Function Gradient Estimator for Importance Weighted Bounds

Arxiv

0+阅读 · 2020年12月8日

Stronger Calibration Lower Bounds via Sidestepping

Arxiv

0+阅读 · 2020年12月7日

Interpreting Deep Neural Networks with Relative Sectional Propagation by Analyzing Comparative Gradients and Hostile Activations

Arxiv

0+阅读 · 2020年12月7日

Online Linear Programming: Dual Convergence, New Algorithms, and Regret Bounds

Arxiv

0+阅读 · 2020年12月6日

A simple Markov chain for independent Bernoulli variables conditioned on their sum

Arxiv

0+阅读 · 2020年12月5日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

最新《高级算法》Advanced Algorithms，176页pdf

最新《高级算法》Advanced Algorithms，176页pdf

专知会员服务

92+阅读 · 2020年10月22日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【开放书-纽约大学】面向数据科学的概率与统计，237页pdf

【开放书-纽约大学】面向数据科学的概率与统计，237页pdf

专知会员服务

149+阅读 · 2020年7月6日

Python计算导论，560页pdf，Introduction to Computing Using Python

Python计算导论，560页pdf，Introduction to Computing Using Python

专知会员服务

75+阅读 · 2020年5月5日

【经典书】算法设计与分析，727页pdf，Algorithms Design and Analysis，牛津大学出版社

【经典书】算法设计与分析，727页pdf，Algorithms Design and Analysis，牛津大学出版社

专知会员服务

134+阅读 · 2020年2月25日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军特种作战条令》最新102页

《洛克希德SR-71“黑鸟”侦察机动力系统》21页slides

美空军作战实验室通过人工智能和指挥控制技术创新推进杀伤链

《指挥控制能力分析方法论》最新报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

已删除

将门创投

4+阅读 · 2018年11月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Optimal Non-Adaptive Probabilistic Group Testing Requires $Θ(\min\{k \log n, n\})$ Tests

Arxiv

0+阅读 · 2020年12月11日

A partial least squares approach for function-on-function interaction regression

Arxiv

0+阅读 · 2020年12月9日

Streaming Algorithms for Stochastic Multi-armed Bandits

Streaming Algorithms for Stochastic Multi-armed Bandits

Arxiv

0+阅读 · 2020年12月9日

Optimal Variance Control of the Score Function Gradient Estimator for Importance Weighted Bounds

Arxiv

0+阅读 · 2020年12月8日

Stronger Calibration Lower Bounds via Sidestepping

Arxiv

0+阅读 · 2020年12月7日

Interpreting Deep Neural Networks with Relative Sectional Propagation by Analyzing Comparative Gradients and Hostile Activations

Arxiv

0+阅读 · 2020年12月7日

Online Linear Programming: Dual Convergence, New Algorithms, and Regret Bounds

Arxiv

0+阅读 · 2020年12月6日

A simple Markov chain for independent Bernoulli variables conditioned on their sum

Arxiv

0+阅读 · 2020年12月5日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

微信扫码咨询专知VIP会员