与斯托查专家的内幕强盗 (Contextual Bandits with Stochastic Experts) - 专知论文

会员服务 ·

0

上下文赌博机/上下文老虎机 · 赌博机/老虎机 · 估计/估计量 · INFORMS · 均值 ·

2021 年 3 月 3 日

Contextual Bandits with Stochastic Experts

翻译：与斯托查专家的内幕强盗

Rajat Sen,Karthikeyan Shanmugam,Nihal Sharma,Sanjay Shakkottai

from arxiv, 20 pages, 2 Figures, Accepted for publication in AISTATS 2018

We consider the problem of contextual bandits with stochastic experts, which is a variation of the traditional stochastic contextual bandit with experts problem. In our problem setting, we assume access to a class of stochastic experts, where each expert is a conditional distribution over the arms given a context. We propose upper-confidence bound (UCB) algorithms for this problem, which employ two different importance sampling based estimators for the mean reward for each expert. Both these estimators leverage information leakage among the experts, thus using samples collected under all the experts to estimate the mean reward of any given expert. This leads to instance dependent regret bounds of $\mathcal{O}\left(\lambda(\pmb{\mu})\mathcal{M}\log T/\Delta \right)$, where $\lambda(\pmb{\mu})$ is a term that depends on the mean rewards of the experts, $\Delta$ is the smallest gap between the mean reward of the optimal expert and the rest, and $\mathcal{M}$ quantifies the information leakage among the experts. We show that under some assumptions $\lambda(\pmb{\mu})$ is typically $\mathcal{O}(\log N)$, where $N$ is the number of experts. We implement our algorithm with stochastic experts generated from cost-sensitive classification oracles and show superior empirical performance on real-world datasets, when compared to other state of the art contextual bandit algorithms.

翻译：我们考虑的是背景强盗问题,由专家与专家之间的背景强盗问题,这是传统随机强盗背景强盗与专家问题之间的一种差异。在问题设置中,我们假定可以接触一组随机专家,在其中,每位专家有条件地分配武器。我们建议对此问题采用基于高信任的算法(UCB),对每位专家平均奖赏使用两个不同的重要性抽样估测器。这两个估测者都利用专家之间的信息泄漏,因此,利用所有专家收集的样本来估计任何专家的平均奖赏。这导致从实例上到上到上到上到下,每个专家的成绩取决于上到上到上,我们最优的专家和其余专家的平均奖赏之间的最小差距。美元和上到上到上到上,我们专家的成绩通常在上到上到上, 美元到下,我们专家的成绩通常在上到上到上, 美元到上到上。

0

相关内容

上下文赌博机/上下文老虎机

上下文赌博机/上下文老虎机

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【干货书】鲁棒优化Robust Optimization，570页pdf

专知会员服务

144+阅读 · 2021年3月17日

【MIT】硬负样本的对比学习

【MIT】硬负样本的对比学习

专知会员服务

40+阅读 · 2020年10月14日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【东京大学】图采样，Sampling on Graphs: From Theory to Applications

【东京大学】图采样，Sampling on Graphs: From Theory to Applications

专知会员服务

19+阅读 · 2020年3月10日

【ICML2020提交论文】Learning@home:众包与分散Mixture-of-Experts训练的神经网络（Learning@home: Crowdsourced Training of Large Neural Networks with Decentralized Mixture-of-Experts）

【ICML2020提交论文】Learning@home:众包与分散Mixture-of-Experts训练的神经网络（Learning@home: Crowdsourced Training of Large Neural Networks with Decentralized Mixture-of-Experts）

专知会员服务

10+阅读 · 2020年2月12日

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

专知会员服务

46+阅读 · 2019年12月13日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

已删除

将门创投

4+阅读 · 2019年11月8日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

【简评】[CVPR2017]Loss Max-Pooling for Semantic Image Segmentation

【简评】[CVPR2017]Loss Max-Pooling for Semantic Image Segmentation

极市平台

5+阅读 · 2017年6月15日

Optimal scaling of random walk Metropolis algorithms using Bayesian large-sample asymptotics

Arxiv

0+阅读 · 2021年4月27日

A linear noise approximation for stochastic epidemic models fit to partially observed incidence counts

Arxiv

0+阅读 · 2021年4月27日

Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization

Arxiv

0+阅读 · 2021年4月26日

Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds

Arxiv

0+阅读 · 2021年4月26日

Variational Approximation of Factor Stochastic Volatility Models

Arxiv

0+阅读 · 2021年4月25日

Robust Federated Learning by Mixture of Experts

Robust Federated Learning by Mixture of Experts

Arxiv

0+阅读 · 2021年4月23日

Learning to reflect: A unifying approach for data-driven stochastic control strategies

Arxiv

0+阅读 · 2021年4月23日

Conservative Contextual Combinatorial Cascading Bandit

Arxiv

0+阅读 · 2021年4月23日

A Dimension-Insensitive Algorithm for Stochastic Zeroth-Order Optimization

Arxiv

0+阅读 · 2021年4月22日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

VIP会员

文章信息

相关主题

上下文赌博机/上下文老虎机

赌博机/老虎机

估计/估计量

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【干货书】鲁棒优化Robust Optimization，570页pdf

专知会员服务

144+阅读 · 2021年3月17日

【MIT】硬负样本的对比学习

【MIT】硬负样本的对比学习

专知会员服务

40+阅读 · 2020年10月14日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【东京大学】图采样，Sampling on Graphs: From Theory to Applications

【东京大学】图采样，Sampling on Graphs: From Theory to Applications

专知会员服务

19+阅读 · 2020年3月10日

【ICML2020提交论文】Learning@home:众包与分散Mixture-of-Experts训练的神经网络（Learning@home: Crowdsourced Training of Large Neural Networks with Decentralized Mixture-of-Experts）

【ICML2020提交论文】Learning@home:众包与分散Mixture-of-Experts训练的神经网络（Learning@home: Crowdsourced Training of Large Neural Networks with Decentralized Mixture-of-Experts）

专知会员服务

10+阅读 · 2020年2月12日

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

专知会员服务

46+阅读 · 2019年12月13日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

已删除

将门创投

4+阅读 · 2019年11月8日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

【简评】[CVPR2017]Loss Max-Pooling for Semantic Image Segmentation

【简评】[CVPR2017]Loss Max-Pooling for Semantic Image Segmentation

极市平台

5+阅读 · 2017年6月15日

相关论文

Optimal scaling of random walk Metropolis algorithms using Bayesian large-sample asymptotics

Arxiv

0+阅读 · 2021年4月27日

A linear noise approximation for stochastic epidemic models fit to partially observed incidence counts

Arxiv

0+阅读 · 2021年4月27日

Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization

Arxiv

0+阅读 · 2021年4月26日

Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds

Arxiv

0+阅读 · 2021年4月26日

Variational Approximation of Factor Stochastic Volatility Models

Arxiv

0+阅读 · 2021年4月25日

Robust Federated Learning by Mixture of Experts

Robust Federated Learning by Mixture of Experts

Arxiv

0+阅读 · 2021年4月23日

Learning to reflect: A unifying approach for data-driven stochastic control strategies

Arxiv

0+阅读 · 2021年4月23日

Conservative Contextual Combinatorial Cascading Bandit

Arxiv

0+阅读 · 2021年4月23日

A Dimension-Insensitive Algorithm for Stochastic Zeroth-Order Optimization

Arxiv

0+阅读 · 2021年4月22日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

微信扫码咨询专知VIP会员