差异软件信任区:线性强力和线性混合MDP无地平线弹道和无地平线弹道 (Variance-Aware Confidence Set: Variance-Dependent Bound for Linear Bandits and Horizon-Free Bound for Linear Mixture MDP) - 专知论文

会员服务 ·

0

线性的 · 赌博机/老虎机 · 方差 · 置信度 · 情景 ·

2021 年 2 月 19 日

Variance-Aware Confidence Set: Variance-Dependent Bound for Linear Bandits and Horizon-Free Bound for Linear Mixture MDP

翻译：差异软件信任区:线性强力和线性混合MDP无地平线弹道和无地平线弹道

Zihan Zhang,Jiaqi Yang,Xiangyang Ji,Simon S. Du

from arxiv, 31 pages

We show how to construct variance-aware confidence sets for linear bandits and linear mixture Markov Decision Process (MDP). Our method yields the following new regret bounds: * For linear bandits, we obtain an $\widetilde{O}(\mathrm{poly}(d)\sqrt{1 + \sum_{i=1}^{K}\sigma_i^2})$ regret bound, where $d$ is the feature dimension, $K$ is the number of rounds, and $\sigma_i^2$ is the (unknown) variance of the reward at the $i$-th round. This is the first regret bound that only scales with the variance and the dimension, with no explicit polynomial dependency on $K$. * For linear mixture MDP, we obtain an $\widetilde{O}(\mathrm{poly}(d, \log H)\sqrt{K})$ regret bound, where $d$ is the number of base models, $K$ is the number of episodes, and $H$ is the planning horizon. This is the first regret bound that only scales logarithmically with $H$ in the reinforcement learning with linear function approximation setting, thus exponentially improving existing results. Our methods utilize three novel ideas that may be of independent interest: 1) applications of the peeling techniques to the norm of input and the magnitude of variance, 2) a recursion-based approach to estimate the variance, and 3) a convex potential lemma that somewhat generalizes the seminal elliptical potential lemma.

翻译：我们展示了如何为线性土匪和线性混合物 Markov 决断进程构建有差异的信任度。我们的方法产生了以下新的遗憾界限 : * 对于线性土匪,我们获得的是全色土匪( mathrm{poly})( d)\qrt{1 +\ sum ⁇ i=1 ⁇ K ⁇ sgma_i ⁇ 2}) 美元约束, 美元是特征维度, 美元是周期性, 美元是周期性美元报酬的( 未知) 。这是第一个遗憾界限, 只有差异和层面的尺度, 没有明确的多色依赖 $. * 对于线性混合物, 我们得到的是 $\\ 百度{Kgma_ =1\ kgma_ =1\ kgma_ i2} ( poly} ( d, log H) = sqraltial } legleglegle, 美元是基础型模型的数值, $K$x 和直线性模型的模型的值值值。

0

相关内容

线性的

【经典书】线性代数，Linear Algebra，525页pdf

【经典书】线性代数，Linear Algebra，525页pdf

专知会员服务

79+阅读 · 2021年1月29日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【ECML-PKDD 2019】序列和时间序列学习的有效线性模型（Effective Linear Models for Learning with Sequences and Time Series），Georgiana Ifrim

【ECML-PKDD 2019】序列和时间序列学习的有效线性模型（Effective Linear Models for Learning with Sequences and Time Series），Georgiana Ifrim

专知会员服务

35+阅读 · 2019年12月1日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

已删除

将门创投

8+阅读 · 2019年7月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

蒙特卡罗方法(Monte Carlo Methods)

蒙特卡罗方法(Monte Carlo Methods)

数据挖掘入门与实战

6+阅读 · 2018年4月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

Arxiv

0+阅读 · 2021年4月13日

A Faster Exponential Time Algorithm for Bin Packing With a Constant Number of Bins via Additive Combinatorics

Arxiv

0+阅读 · 2021年4月13日

Censored Semi-Bandits for Resource Allocation

Arxiv

0+阅读 · 2021年4月12日

An Efficient Algorithm for Deep Stochastic Contextual Bandits

An Efficient Algorithm for Deep Stochastic Contextual Bandits

Arxiv

0+阅读 · 2021年4月12日

Learning from Censored and Dependent Data: The case of Linear Dynamics

Arxiv

0+阅读 · 2021年4月11日

A Kernel-free Boundary Integral Method for the Bidomain Equations

Arxiv

0+阅读 · 2021年4月11日

The shortest confidence interval for Poisson mean

Arxiv

0+阅读 · 2021年4月7日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【经典书】线性代数，Linear Algebra，525页pdf

【经典书】线性代数，Linear Algebra，525页pdf

专知会员服务

79+阅读 · 2021年1月29日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【ECML-PKDD 2019】序列和时间序列学习的有效线性模型（Effective Linear Models for Learning with Sequences and Time Series），Georgiana Ifrim

【ECML-PKDD 2019】序列和时间序列学习的有效线性模型（Effective Linear Models for Learning with Sequences and Time Series），Georgiana Ifrim

专知会员服务

35+阅读 · 2019年12月1日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

已删除

将门创投

8+阅读 · 2019年7月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

蒙特卡罗方法(Monte Carlo Methods)

蒙特卡罗方法(Monte Carlo Methods)

数据挖掘入门与实战

6+阅读 · 2018年4月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

Arxiv

0+阅读 · 2021年4月13日

A Faster Exponential Time Algorithm for Bin Packing With a Constant Number of Bins via Additive Combinatorics

Arxiv

0+阅读 · 2021年4月13日

Censored Semi-Bandits for Resource Allocation

Arxiv

0+阅读 · 2021年4月12日

An Efficient Algorithm for Deep Stochastic Contextual Bandits

An Efficient Algorithm for Deep Stochastic Contextual Bandits

Arxiv

0+阅读 · 2021年4月12日

Learning from Censored and Dependent Data: The case of Linear Dynamics

Arxiv

0+阅读 · 2021年4月11日

A Kernel-free Boundary Integral Method for the Bidomain Equations

Arxiv

0+阅读 · 2021年4月11日

The shortest confidence interval for Poisson mean

Arxiv

0+阅读 · 2021年4月7日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

微信扫码咨询专知VIP会员