通用线性直线内地土匪的双杜布利硬体汤普森抽样 (Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits) - 专知论文

会员服务 ·

0

上下文赌博机/上下文老虎机 · 赌博机/老虎机 · 估计/估计量 · 线性的 · 边缘化 ·

2022 年 9 月 15 日

Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits

翻译：通用线性直线内地土匪的双杜布利硬体汤普森抽样

Wonyoung Kim,Kyungbok Lee,Myunghee Cho Paik

from arxiv, 33 pages including Appendix

We propose a novel contextual bandit algorithm for generalized linear rewards with an $\tilde{O}(\sqrt{\kappa^{-1} \phi T})$ regret over $T$ rounds where $\phi$ is the minimum eigenvalue of the covariance of contexts and $\kappa$ is a lower bound of the variance of rewards. In several practical cases where $\phi=O(d)$, our result is the first regret bound for generalized linear model (GLM) bandits with the order $\sqrt{d}$ without relying on the approach of Auer [2002]. We achieve this bound using a novel estimator called double doubly-robust (DDR) estimator, a subclass of doubly-robust (DR) estimator but with a tighter error bound. The approach of Auer [2002] achieves independence by discarding the observed rewards, whereas our algorithm achieves independence considering all contexts using our DDR estimator. We also provide an $O(\kappa^{-1} \phi \log (NT) \log T)$ regret bound for $N$ arms under a probabilistic margin condition. Regret bounds under the margin condition are given by Bastani and Bayati [2020] and Bastani et al. [2021] under the setting that contexts are common to all arms but coefficients are arm-specific. When contexts are different for all arms but coefficients are common, ours is the first regret bound under the margin condition for linear models or GLMs. We conduct empirical studies using synthetic data and real examples, demonstrating the effectiveness of our algorithm.

翻译：我们为一般线性报酬提出了一个新的背景强盗算法, 其价格为$\ tilde{ O}(\ sqrt kapa}-1}\ fi T}) 。美元是环境共差的最低值, 美元是回报差异的下限。在一些实际例子中, 美元= O(d) 美元, 我们的结果是对通用线性模型( GLM) 匪徒的第一个遗憾, 其顺序为$\sqrt{d} $, 不依赖于 Auer[ 2002] 的方法。我们使用名为双双双色调( DDD) 估量( DDR), 美元是双色调( DKPA) 值的下值, 而美元(KAuer) 的方法通过放弃观察到的奖赏, 我们的算法在各种情况下都实现了独立。我们还提供$( kapa) - 1美元 - 1美元

0

相关内容

上下文赌博机/上下文老虎机

上下文赌博机/上下文老虎机

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

平面上几类椭圆型方程解的集中现象

国家自然科学基金

0+阅读 · 2015年12月31日

图的小彩虹连通数与彩虹连通数上界的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Ni/Pd/Pt掺杂的BaTi1-xCoxO3薄膜带隙工程的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于数据的基因网络时标系统建模、动力学及同步控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于DMOC的复杂环境飞行器优化轨迹生成实时性能问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

GeV 能区质子束流在ADS散裂靶中的能量沉积研究

国家自然科学基金

0+阅读 · 2012年12月31日

非线性系统和网络控制系统的迭代学习控制律的收敛性态研究

国家自然科学基金

0+阅读 · 2012年12月31日

考虑班轮公司和货代公司委托代理关系的集装箱调度管理

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

委托代理问题的一类优化方法和算法设计研究

国家自然科学基金

0+阅读 · 2009年12月31日

Sampling-Based Approximations to Minimum Bayes Risk Decoding for Neural Machine Translation

Sampling-Based Approximations to Minimum Bayes Risk Decoding for Neural Machine Translation

Arxiv

0+阅读 · 2022年10月25日

Mixed-Effect Thompson Sampling

Arxiv

0+阅读 · 2022年10月25日

Verified eigenvalue and eigenvector computations using complex moments and the Rayleigh$\unicode{x2013}$Ritz procedure for generalized Hermitian eigenvalue problems

Arxiv

0+阅读 · 2022年10月25日

Supervised Principal Component Regression for Functional Responses with High Dimensional Predictors

Arxiv

0+阅读 · 2022年10月24日

Improved Bi-point Rounding Algorithms and a Golden Barrier for $k$-Median

Arxiv

0+阅读 · 2022年10月24日

Robust angle-based transfer learning in high dimensions

Arxiv

0+阅读 · 2022年10月23日

Distributed linear regression by averaging

Arxiv

0+阅读 · 2022年10月22日

On-Demand Sampling: Learning Optimally from Multiple Distributions

Arxiv

0+阅读 · 2022年10月22日

A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models

A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models

Arxiv

0+阅读 · 2022年10月21日

Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

Arxiv

0+阅读 · 2022年10月20日

VIP会员

文章信息

相关主题

上下文赌博机/上下文老虎机

赌博机/老虎机

估计/估计量

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军特种作战条令》最新102页

《洛克希德SR-71“黑鸟”侦察机动力系统》21页slides

美空军作战实验室通过人工智能和指挥控制技术创新推进杀伤链

《指挥控制能力分析方法论》最新报告

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Sampling-Based Approximations to Minimum Bayes Risk Decoding for Neural Machine Translation

Sampling-Based Approximations to Minimum Bayes Risk Decoding for Neural Machine Translation

Arxiv

0+阅读 · 2022年10月25日

Mixed-Effect Thompson Sampling

Arxiv

0+阅读 · 2022年10月25日

Verified eigenvalue and eigenvector computations using complex moments and the Rayleigh$\unicode{x2013}$Ritz procedure for generalized Hermitian eigenvalue problems

Arxiv

0+阅读 · 2022年10月25日

Supervised Principal Component Regression for Functional Responses with High Dimensional Predictors

Arxiv

0+阅读 · 2022年10月24日

Improved Bi-point Rounding Algorithms and a Golden Barrier for $k$-Median

Arxiv

0+阅读 · 2022年10月24日

Robust angle-based transfer learning in high dimensions

Arxiv

0+阅读 · 2022年10月23日

Distributed linear regression by averaging

Arxiv

0+阅读 · 2022年10月22日

On-Demand Sampling: Learning Optimally from Multiple Distributions

Arxiv

0+阅读 · 2022年10月22日

A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models

A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models

Arxiv

0+阅读 · 2022年10月21日

Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

Arxiv

0+阅读 · 2022年10月20日

相关基金

平面上几类椭圆型方程解的集中现象

国家自然科学基金

0+阅读 · 2015年12月31日

图的小彩虹连通数与彩虹连通数上界的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Ni/Pd/Pt掺杂的BaTi1-xCoxO3薄膜带隙工程的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于数据的基因网络时标系统建模、动力学及同步控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于DMOC的复杂环境飞行器优化轨迹生成实时性能问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

GeV 能区质子束流在ADS散裂靶中的能量沉积研究

国家自然科学基金

0+阅读 · 2012年12月31日

非线性系统和网络控制系统的迭代学习控制律的收敛性态研究

国家自然科学基金

0+阅读 · 2012年12月31日

考虑班轮公司和货代公司委托代理关系的集装箱调度管理

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

委托代理问题的一类优化方法和算法设计研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员