具有控控控变化的多装甲斯托克多装甲强盗 (Stochastic Multi-Armed Bandits with Control Variates) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 控制器 · 上置信界限 · 置信度 · Wireless Networks ·

2021 年 11 月 9 日

Stochastic Multi-Armed Bandits with Control Variates

翻译：具有控控控变化的多装甲斯托克多装甲强盗

Arun Verma,Manjesh K. Hanawal

from arxiv, Accepted to NeurIPS 2021

This paper studies a new variant of the stochastic multi-armed bandits problem where auxiliary information about the arm rewards is available in the form of control variates. In many applications like queuing and wireless networks, the arm rewards are functions of some exogenous variables. The mean values of these variables are known a priori from historical data and can be used as control variates. Leveraging the theory of control variates, we obtain mean estimates with smaller variance and tighter confidence bounds. We develop an improved upper confidence bound based algorithm named UCB-CV and characterize the regret bounds in terms of the correlation between rewards and control variates when they follow a multivariate normal distribution. We also extend UCB-CV to other distributions using resampling methods like Jackknifing and Splitting. Experiments on synthetic problem instances validate performance guarantees of the proposed algorithms.

翻译：本文研究一种新型的随机多武装匪徒问题,即以控制变异的形式提供关于手臂奖励的辅助信息。在许多应用中,如排队和无线网络,手臂奖励是一些外源变量的函数。这些变量的平均值从历史数据中先验地得知,可以用作控制变异。利用控制变异理论,我们获得平均估计数,但差异较小,信任界限更紧。我们开发了一种以UCB-CV为名的更高级信任约束算法,并用多种变异分布后奖赏和控制变异之间的相关性来描述遗憾界限。我们还利用Jackkfining和分解等重新标注方法将UCB-CV推广到其他分布。在合成问题实例上进行的实验验证了拟议算法的性能保障。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【因果基础】Causality Basics，36页ppt

专知会员服务

52+阅读 · 2021年8月8日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

不可错过！MASON最新《贝叶斯推断与决策理论》课程，附PPT下载

不可错过！MASON最新《贝叶斯推断与决策理论》课程，附PPT下载

专知会员服务

34+阅读 · 2020年12月25日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

A Method for Estimating the Entropy of Time Series Using Artificial Neural Networks

Arxiv

0+阅读 · 2022年1月13日

Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances

Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances

Arxiv

0+阅读 · 2022年1月12日

Inference in Regression Discontinuity Designs with High-Dimensional Covariates

Arxiv

0+阅读 · 2022年1月12日

Onboard Safety Guarantees for Racing Drones: High-speed Geofencing with Control Barrier Functions

Arxiv

0+阅读 · 2022年1月12日

Mixed-type multivariate response regression with covariance estimation

Arxiv

0+阅读 · 2022年1月11日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

SpectralLeader: Online Spectral Learning for Single Topic Models

Arxiv

4+阅读 · 2018年2月16日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

A three domain covariance framework for EEG/MEG data

Arxiv

3+阅读 · 2014年10月9日

VIP会员

文章信息

相关主题

赌博机/老虎机

上置信界限

Wireless Networks

相关VIP内容

【因果基础】Causality Basics，36页ppt

专知会员服务

52+阅读 · 2021年8月8日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

不可错过！MASON最新《贝叶斯推断与决策理论》课程，附PPT下载

不可错过！MASON最新《贝叶斯推断与决策理论》课程，附PPT下载

专知会员服务

34+阅读 · 2020年12月25日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

自动驾驶轨迹规划中的基础模型：进展综述与开放挑战

《用于提升多域战备的大型语言模型辅助场景生成器》报告

【斯坦福博士论文】为人类使用优化 AI 模型

国防领域人工智能规模化应用的理论与实践

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

A Method for Estimating the Entropy of Time Series Using Artificial Neural Networks

Arxiv

0+阅读 · 2022年1月13日

Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances

Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances

Arxiv

0+阅读 · 2022年1月12日

Inference in Regression Discontinuity Designs with High-Dimensional Covariates

Arxiv

0+阅读 · 2022年1月12日

Onboard Safety Guarantees for Racing Drones: High-speed Geofencing with Control Barrier Functions

Arxiv

0+阅读 · 2022年1月12日

Mixed-type multivariate response regression with covariance estimation

Arxiv

0+阅读 · 2022年1月11日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

SpectralLeader: Online Spectral Learning for Single Topic Models

Arxiv

4+阅读 · 2018年2月16日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

A three domain covariance framework for EEG/MEG data

Arxiv

3+阅读 · 2014年10月9日

微信扫码咨询专知VIP会员