提升信息比率:对汤普森背景强盗抽样调查的信息理论分析</s> (Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 上下文赌博机/上下文老虎机 · INFORMS · Lipschitz · Analysis ·

2023 年 3 月 6 日

Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits

翻译：提升信息比率:对汤普森背景强盗抽样调查的信息理论分析

Gergely Neu,Julia Olkhovskaya,Matteo Papini,Ludovic Schwartz

We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual bandits with binary losses and adversarially-selected contexts. We adapt the information-theoretic perspective of \cite{RvR16} to the contextual setting by considering a lifted version of the information ratio defined in terms of the unknown model parameter instead of the optimal action or optimal policy as done in previous works on the same setting. This allows us to bound the regret in terms of the entropy of the prior distribution through a remarkably simple proof, and with no structural assumptions on the likelihood or the prior. The extension to priors with infinite entropy only requires a Lipschitz assumption on the log-likelihood. An interesting special case is that of logistic bandits with $d$-dimensional parameters, $K$ actions, and Lipschitz logits, for which we provide a $\widetilde{O}(\sqrt{dKT})$ regret upper-bound that does not depend on the smallest slope of the sigmoid link function.

翻译：我们研究了著名的汤普森抽样算法在具有二进制损失和对抗性选择背景的背景强盗中产生的巴伊西亚人的遗憾。我们通过考虑以未知模型参数而不是与以前在同一环境中的工作一样的最佳行动或最佳政策来界定的信息比率的取消版本,将\ cite{RvR16} 的信息理论视角与背景环境相适应。这使我们能够通过一个非常简单的证明,将先前分布的微小的遗憾与先前分布的微小的假设捆绑在一起,而没有关于可能性或先前的结构性假设。将无限昆虫的先行扩展仅需要对日志相似性的Lipschitz假设。一个有趣的特殊案例是具有美元维度参数的后勤强盗、 $K$ 动作和 Lipschitz logits, 我们为此提供了一个不取决于小行星连接功能最小斜度的 $\ ltipilde{O} (sqrt{dKT} $(sqrt{kt}</s>

0

相关内容

赌博机/老虎机

赌博机/老虎机

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

集值优化问题的逼近解及二阶最优性条件

国家自然科学基金

0+阅读 · 2014年12月31日

具有临界非线性项的薛定谔方程解的渐近行为

国家自然科学基金

0+阅读 · 2014年12月31日

纳米“高k栅”CMOS电路在单粒子效应下的可靠性研究

国家自然科学基金

0+阅读 · 2013年12月31日

超高压压裂泵自增强残余应力变化规律研究

国家自然科学基金

0+阅读 · 2012年12月31日

Stat3抑制myocardin诱导心肌肥厚的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于分布式编码的多视点视频差错控制研究

国家自然科学基金

0+阅读 · 2009年12月31日

薛定愕型方程的两网格有限元解法

国家自然科学基金

0+阅读 · 2009年12月31日

戊型肝炎病毒ORF2蛋白与宿主细胞蛋白的相互作用及其生物学意义

国家自然科学基金

0+阅读 · 2008年12月31日

Provably Stabilizing Global-Position Tracking Control for Hybrid Models of Multi-Domain Bipedal Walking via Multiple Lyapunov Analysis

Arxiv

0+阅读 · 2023年4月27日

Structure-Aware Lower Bounds and Broadening the Horizon of Tractability for QBF

Arxiv

0+阅读 · 2023年4月27日

Data-driven Piecewise Affine Decision Rules for Stochastic Programming with Covariate Information

Arxiv

0+阅读 · 2023年4月26日

Understanding the limitation of Total Correlation Estimation Based on Mutual Information Bounds

Arxiv

0+阅读 · 2023年4月26日

Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards

Arxiv

0+阅读 · 2023年4月26日

CrowdCache: A Decentralized Game-Theoretic Framework for Mobile Edge Content Sharing

Arxiv

0+阅读 · 2023年4月26日

Exact recovery for the non-uniform Hypergraph Stochastic Block Model

Arxiv

0+阅读 · 2023年4月25日

A Simplicity Bubble Problem in Formal-Theoretic Learning Systems

Arxiv

0+阅读 · 2023年4月25日

On the Ability of Graph Neural Networks to Model Interactions Between Vertices

Arxiv

0+阅读 · 2023年4月25日

How Framelets Enhance Graph Neural Networks

Arxiv

21+阅读 · 2021年2月13日

VIP会员

文章信息

相关主题

赌博机/老虎机

上下文赌博机/上下文老虎机

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

相关论文

Provably Stabilizing Global-Position Tracking Control for Hybrid Models of Multi-Domain Bipedal Walking via Multiple Lyapunov Analysis

Arxiv

0+阅读 · 2023年4月27日

Structure-Aware Lower Bounds and Broadening the Horizon of Tractability for QBF

Arxiv

0+阅读 · 2023年4月27日

Data-driven Piecewise Affine Decision Rules for Stochastic Programming with Covariate Information

Arxiv

0+阅读 · 2023年4月26日

Understanding the limitation of Total Correlation Estimation Based on Mutual Information Bounds

Arxiv

0+阅读 · 2023年4月26日

Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards

Arxiv

0+阅读 · 2023年4月26日

CrowdCache: A Decentralized Game-Theoretic Framework for Mobile Edge Content Sharing

Arxiv

0+阅读 · 2023年4月26日

Exact recovery for the non-uniform Hypergraph Stochastic Block Model

Arxiv

0+阅读 · 2023年4月25日

A Simplicity Bubble Problem in Formal-Theoretic Learning Systems

Arxiv

0+阅读 · 2023年4月25日

On the Ability of Graph Neural Networks to Model Interactions Between Vertices

Arxiv

0+阅读 · 2023年4月25日

How Framelets Enhance Graph Neural Networks

Arxiv

21+阅读 · 2021年2月13日

相关基金

集值优化问题的逼近解及二阶最优性条件

国家自然科学基金

0+阅读 · 2014年12月31日

具有临界非线性项的薛定谔方程解的渐近行为

国家自然科学基金

0+阅读 · 2014年12月31日

纳米“高k栅”CMOS电路在单粒子效应下的可靠性研究

国家自然科学基金

0+阅读 · 2013年12月31日

超高压压裂泵自增强残余应力变化规律研究

国家自然科学基金

0+阅读 · 2012年12月31日

Stat3抑制myocardin诱导心肌肥厚的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于分布式编码的多视点视频差错控制研究

国家自然科学基金

0+阅读 · 2009年12月31日

薛定愕型方程的两网格有限元解法

国家自然科学基金

0+阅读 · 2009年12月31日

戊型肝炎病毒ORF2蛋白与宿主细胞蛋白的相互作用及其生物学意义

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员