线性汤普森采样的频率主义后悔问题 (On Frequentist Regret of Linear Thompson Sampling) - 专知论文

会员服务 ·

0

极小值 · 贝叶斯 · 赌博机 · 启发式算法 · 理论分析 ·

2023 年 4 月 20 日

On Frequentist Regret of Linear Thompson Sampling

翻译：线性汤普森采样的频率主义后悔问题

Nima Hamidi,Mohsen Bayati

This paper studies the stochastic linear bandit problem, where a decision-maker chooses actions from possibly time-dependent sets of vectors in $\mathbb{R}^d$ and receives noisy rewards. The objective is to minimize regret, the difference between the cumulative expected reward of the decision-maker and that of an oracle with access to the expected reward of each action, over a sequence of $T$ decisions. Linear Thompson Sampling (LinTS) is a popular Bayesian heuristic, supported by theoretical analysis that shows its Bayesian regret is bounded by $\widetilde{\mathcal{O}}(d\sqrt{T})$, matching minimax lower bounds. However, previous studies demonstrate that the frequentist regret bound for LinTS is $\widetilde{\mathcal{O}}(d\sqrt{dT})$, which requires posterior variance inflation and is by a factor of $\sqrt{d}$ worse than the best optimism-based algorithms. We prove that this inflation is fundamental and that the frequentist bound of $\widetilde{\mathcal{O}}(d\sqrt{dT})$ is the best possible, by demonstrating a randomization bias phenomenon in LinTS that can cause linear regret without inflation.We propose a data-driven version of LinTS that adjusts posterior inflation using observed data, which can achieve minimax optimal frequentist regret, under additional conditions. Our analysis provides new insights into LinTS and settles an open problem in the field.

翻译：本文研究随机线性赌博机问题，其中决策者从可能与时间相关的向量集合中选择动作，并获得噪声奖励。目标是在一系列 $T$ 决策中将后悔最小化，这里后悔被定义为决策者的累计期望奖励与具有每个动作的期望奖励访问权限的预言家之间的差异。线性汤普森采样（LinTS）是一种流行的贝叶斯启发式算法，并且在其支持的理论分析中，它的贝叶斯后悔由 $\widetilde{\mathcal{O}}(d\sqrt{T})$ 限制，与极小值下限相匹配。然而，先前的研究表明，LinTS 的频率主义后悔界限为 $\widetilde{\mathcal{O}}(d\sqrt{dT})$，需要后验方差膨胀，并且是最佳乐观性算法的 $\sqrt{d}$ 倍。我们证明了膨胀是根本的，并且以下两个相反的论点是正确的：在不膨胀的情况下，LinTS 的线性后悔是可能的；LinTS 的频率主义界限 $\widetilde{\mathcal{O}}(d\sqrt{dT})$ 是最佳的。我们提出了一个数据驱动的版本 LinTS，利用观测数据来调整后验膨胀，能够在额外条件下实现极小值最优的频率主义后悔。我们的分析为 LinTS 提供了新的见解，并解决了该领域的一个问题。

1

相关内容

极小值

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

65+阅读 · 2023年2月15日

【ICML2022】鲁棒强化学习的策略梯度法

【ICML2022】鲁棒强化学习的策略梯度法

专知会员服务

38+阅读 · 2022年5月21日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

88+阅读 · 2021年12月9日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

经典书「统计学习要素（The Elements of Statistical Learning）」笔记与非官方习题解答

经典书「统计学习要素（The Elements of Statistical Learning）」笔记与非官方习题解答

专知

35+阅读 · 2021年4月17日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

用 LDA 和 LSA 两种方法来降维和做 Topic 建模

用 LDA 和 LSA 两种方法来降维和做 Topic 建模

AI研习社

13+阅读 · 2018年8月24日

资源 | 跟着Sutton经典教材学强化学习中的蒙特卡罗方法（代码实例）

资源 | 跟着Sutton经典教材学强化学习中的蒙特卡罗方法（代码实例）

大数据文摘

11+阅读 · 2018年6月12日

强化学习初探 - 从多臂老虎机问题说起

强化学习初探 - 从多臂老虎机问题说起

专知

10+阅读 · 2018年4月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

布尔可满足性算法和单调布尔函数的复杂性

国家自然科学基金

0+阅读 · 2015年12月31日

广义线性模型的组变量选择及其在信用评分中的应用

国家自然科学基金

2+阅读 · 2014年12月31日

面向城市突发公共事件的直觉模糊感知进化群决策方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

非线性半定规划的非退化性与强适性内点方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

泛函不等式与随机微分方程上的大偏差问题

国家自然科学基金

0+阅读 · 2012年12月31日

非期望效用与纳什均衡- - 基于行为决策理论视角

国家自然科学基金

4+阅读 · 2012年12月31日

约束优化问题的拉格朗日乘子理论与算法研究

国家自然科学基金

1+阅读 · 2011年12月31日

非凸二次约束二次优化问题的理论与全局数值方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

一种适用于高维问题的Co-kriging代理模型新方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Convergence of SARSA with linear function approximation: The random horizon case

Arxiv

0+阅读 · 2023年6月7日

From Random Search to Bandit Learning in Metric Measure Spaces

Arxiv

0+阅读 · 2023年6月6日

On the reconstruction of functions from values at subsampled quadrature points

Arxiv

0+阅读 · 2023年6月5日

Online Learning with Feedback Graphs: The True Shape of Regret

Arxiv

0+阅读 · 2023年6月5日

Dispersion on the Complete Graph

Arxiv

0+阅读 · 2023年6月4日

Quantum Lower Bounds for Finding Stationary Points of Nonconvex Functions

Arxiv

0+阅读 · 2023年6月4日

Reduction of finite sampling noise in quantum neural networks

Arxiv

0+阅读 · 2023年6月2日

Semiparametric efficient estimation of genetic relatedness with machine learning methods

Arxiv

0+阅读 · 2023年6月2日

A Convex Relaxation Approach to Bayesian Regret Minimization in Offline Bandits

Arxiv

0+阅读 · 2023年6月2日

Refined Regret for Adversarial MDPs with Linear Function Approximation

Arxiv

0+阅读 · 2023年6月1日

VIP会员

文章信息

相关主题

启发式算法

相关VIP内容

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

65+阅读 · 2023年2月15日

【ICML2022】鲁棒强化学习的策略梯度法

【ICML2022】鲁棒强化学习的策略梯度法

专知会员服务

38+阅读 · 2022年5月21日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

88+阅读 · 2021年12月9日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

数据智能体综述：新兴范式还是被高估的炒作？

海底战已至：美国构思海底安全战略 | 最新报告

【ICCV2025教程】视觉异常检测中的基础模型：进展、挑战与应用

美军将无人自主等新技术融入潜艇部队以更具杀伤力

相关资讯

经典书「统计学习要素（The Elements of Statistical Learning）」笔记与非官方习题解答

经典书「统计学习要素（The Elements of Statistical Learning）」笔记与非官方习题解答

专知

35+阅读 · 2021年4月17日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

用 LDA 和 LSA 两种方法来降维和做 Topic 建模

用 LDA 和 LSA 两种方法来降维和做 Topic 建模

AI研习社

13+阅读 · 2018年8月24日

资源 | 跟着Sutton经典教材学强化学习中的蒙特卡罗方法（代码实例）

资源 | 跟着Sutton经典教材学强化学习中的蒙特卡罗方法（代码实例）

大数据文摘

11+阅读 · 2018年6月12日

强化学习初探 - 从多臂老虎机问题说起

强化学习初探 - 从多臂老虎机问题说起

专知

10+阅读 · 2018年4月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Convergence of SARSA with linear function approximation: The random horizon case

Arxiv

0+阅读 · 2023年6月7日

From Random Search to Bandit Learning in Metric Measure Spaces

Arxiv

0+阅读 · 2023年6月6日

On the reconstruction of functions from values at subsampled quadrature points

Arxiv

0+阅读 · 2023年6月5日

Online Learning with Feedback Graphs: The True Shape of Regret

Arxiv

0+阅读 · 2023年6月5日

Dispersion on the Complete Graph

Arxiv

0+阅读 · 2023年6月4日

Quantum Lower Bounds for Finding Stationary Points of Nonconvex Functions

Arxiv

0+阅读 · 2023年6月4日

Reduction of finite sampling noise in quantum neural networks

Arxiv

0+阅读 · 2023年6月2日

Semiparametric efficient estimation of genetic relatedness with machine learning methods

Arxiv

0+阅读 · 2023年6月2日

A Convex Relaxation Approach to Bayesian Regret Minimization in Offline Bandits

Arxiv

0+阅读 · 2023年6月2日

Refined Regret for Adversarial MDPs with Linear Function Approximation

Arxiv

0+阅读 · 2023年6月1日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

布尔可满足性算法和单调布尔函数的复杂性

国家自然科学基金

0+阅读 · 2015年12月31日

广义线性模型的组变量选择及其在信用评分中的应用

国家自然科学基金

2+阅读 · 2014年12月31日

面向城市突发公共事件的直觉模糊感知进化群决策方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

非线性半定规划的非退化性与强适性内点方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

泛函不等式与随机微分方程上的大偏差问题

国家自然科学基金

0+阅读 · 2012年12月31日

非期望效用与纳什均衡- - 基于行为决策理论视角

国家自然科学基金

4+阅读 · 2012年12月31日

约束优化问题的拉格朗日乘子理论与算法研究

国家自然科学基金

1+阅读 · 2011年12月31日

非凸二次约束二次优化问题的理论与全局数值方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

一种适用于高维问题的Co-kriging代理模型新方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员