Doubly Robust Thompson 线性付款抽样 (Doubly Robust Thompson Sampling for linear payoffs) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 稳健性 · 上下文赌博机/上下文老虎机 · 线性的 · ARM ·

2021 年 2 月 1 日

Doubly Robust Thompson Sampling for linear payoffs

翻译：Doubly Robust Thompson 线性付款抽样

Wonyoung Kim,Gi-soo Kim,Myunghee Cho Paik

from arxiv, 18pages including Supplementary Materials

A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chosen arm and the rewards of other arms remain missing. Since the arm choice depends on the past context and reward pairs, the contexts of chosen arms suffer from correlation and render the analysis difficult. We propose a novel multi-armed contextual bandit algorithm called Doubly Robust (DR) Thompson Sampling (TS) that applies the DR technique used in missing data literature to TS. The proposed algorithm improves the bound of TS by a factor of $\sqrt{d}$, where $d$ is the dimension of the context. A benefit of the proposed method is that it uses all the context data, chosen or not chosen, thus allowing to circumvent the technical definition of unsaturated arms used in theoretical analysis of TS. Empirical studies show the advantage of the proposed algorithm over TS.

翻译：盗匪问题的一个具有挑战性的方面是,只对所选的手臂进行抽查性奖励,而其他武器的奖励仍然缺失。由于手臂的选择取决于过去的背景和奖赏,所选的手臂的背景存在关联性,使分析难于进行。我们建议采用新的多武装背景土匪算法,称为Doubly Robust (DR) Thompson 抽样(TS),将缺失的数据文献中使用的DR技术应用于TS。提议的算法将TS的界限提高1美元,即$d$是背景的维度。拟议方法的一个好处是,它使用所有背景数据,无论是选择还是未选择,从而可以绕过TS理论分析中使用的不饱和武器的技术定义。 Empirical 研究显示,提议的算法比TS的优势在于$@sqrt{d}。

0

相关内容

赌博机/老虎机

赌博机/老虎机

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

67+阅读 · 2020年3月28日

【清华大学】自动微分蒙特卡洛，理论与应用，Automatic Differentiable Monte Carlo: Theory and Application (附pdf）

专知会员服务

28+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

10+阅读 · 2018年5月2日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Nonstochastic Bandits with Infinitely Many Experts

Arxiv

0+阅读 · 2021年3月26日

FRITL: A Hybrid Method for Causal Discovery in the Presence of Latent Confounders

Arxiv

0+阅读 · 2021年3月26日

Robust and Private Learning of Halfspaces

Arxiv

0+阅读 · 2021年3月25日

Stochastic Bandits for Multi-platform Budget Optimization in Online Advertising

Arxiv

0+阅读 · 2021年3月25日

Bayesian Matrix Completion for Hypothesis Testing

Arxiv

0+阅读 · 2021年3月24日

$\mathbb{X}$Resolution Correspondence Networks

Arxiv

0+阅读 · 2021年3月24日

Stochastic Potential Games

Arxiv

0+阅读 · 2021年3月24日

On Imitation Learning of Linear Control Policies: Enforcing Stability and Robustness Constraints via LMI Conditions

Arxiv

0+阅读 · 2021年3月24日

Majorant series for the $N$-body problem

Arxiv

0+阅读 · 2021年3月23日

Polya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler

Arxiv

3+阅读 · 2018年4月23日

VIP会员

文章信息

相关主题

赌博机/老虎机

上下文赌博机/上下文老虎机

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

67+阅读 · 2020年3月28日

【清华大学】自动微分蒙特卡洛，理论与应用，Automatic Differentiable Monte Carlo: Theory and Application (附pdf）

专知会员服务

28+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

10+阅读 · 2018年5月2日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Nonstochastic Bandits with Infinitely Many Experts

Arxiv

0+阅读 · 2021年3月26日

FRITL: A Hybrid Method for Causal Discovery in the Presence of Latent Confounders

Arxiv

0+阅读 · 2021年3月26日

Robust and Private Learning of Halfspaces

Arxiv

0+阅读 · 2021年3月25日

Stochastic Bandits for Multi-platform Budget Optimization in Online Advertising

Arxiv

0+阅读 · 2021年3月25日

Bayesian Matrix Completion for Hypothesis Testing

Arxiv

0+阅读 · 2021年3月24日

$\mathbb{X}$Resolution Correspondence Networks

Arxiv

0+阅读 · 2021年3月24日

Stochastic Potential Games

Arxiv

0+阅读 · 2021年3月24日

On Imitation Learning of Linear Control Policies: Enforcing Stability and Robustness Constraints via LMI Conditions

Arxiv

0+阅读 · 2021年3月24日

Majorant series for the $N$-body problem

Arxiv

0+阅读 · 2021年3月23日

Polya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler

Arxiv

3+阅读 · 2018年4月23日

微信扫码咨询专知VIP会员