适应因地制宜的土匪与离线回归器的反回归器 (Adapting to misspecification in contextual bandits with offline regression oracles) - 专知论文

会员服务 ·

0

上下文赌博机/上下文老虎机 · 赌博机/老虎机 · Oracle · 广义函数 · 稳健性 ·

2021 年 2 月 26 日

Adapting to misspecification in contextual bandits with offline regression oracles

翻译：适应因地制宜的土匪与离线回归器的反回归器

Sanath Kumar Krishnamurthy,Vitor Hadad,Susan Athey

from arxiv, 37 pages, 2 figures

Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the bandit algorithm may incur unexpected regret, so recent work has focused on algorithms that are robust to misspecification. We propose a simple family of contextual bandit algorithms that adapt to misspecification error by reverting to a good safe policy when there is evidence that misspecification is causing a regret increase. Our algorithm requires only an offline regression oracle to ensure regret guarantees that gracefully degrade in terms of a measure of the average misspecification level. Compared to prior work, we attain similar regret guarantees, but we do no rely on a master algorithm, and do not require more robust oracles like online or constrained regression oracles (e.g., Foster et al. (2020a); Krishnamurthy et al. (2020)). This allows us to design algorithms for more general function approximation classes.

翻译：效率高的背景强盗往往基于利用过去的数据来估计一种预测模型,根据不同的背景和武器来估计奖赏。然而,当奖赏模型没有很好地指定时,土匪算法可能会引起意外的遗憾,因此最近的工作侧重于强于偏差的算法。我们建议建立一个简单的背景强盗算法组合,在有证据表明错误区分正在导致遗憾增加时,通过恢复到一个良好的安全政策来适应错误的区分错误。我们的算法只需要一个离线的回归或触角,以确保在衡量平均误差水平时出现优减的遗憾保证。与以前的工作相比,我们获得了类似的遗憾保证,但我们并不依赖主算法,而不需要像在线或受限制的回归或触法(例如,Foster等人(2020年a);Krishnamurthy 等人(202020年)那样的更坚固的手法或手法。这使我们能够为更普遍的功能近似等级设计算法。

0

相关内容

上下文赌博机/上下文老虎机

上下文赌博机/上下文老虎机

【斯坦福CS224w图机器学习第6讲】图神经网络模型概述总结，67页ppt

【斯坦福CS224w图机器学习第6讲】图神经网络模型概述总结，67页ppt

专知会员服务

59+阅读 · 2021年1月31日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

人工智能如何用于抵抗COVID-19？Mila这份《AI against COVID-19 》PPT

专知会员服务

48+阅读 · 2020年5月17日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

专知会员服务

11+阅读 · 2019年11月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

随波逐流：Similarity-Adaptive and Discrete Optimization

随波逐流：Similarity-Adaptive and Discrete Optimization

我爱读PAMI

5+阅读 · 2018年2月6日

Oracle Efficient Estimation of Structural Breaks in Cointegrating Regressions

Arxiv

0+阅读 · 2021年4月20日

Swap Dynamics in Single-Peaked Housing Markets

Arxiv

0+阅读 · 2021年4月19日

Linear shrinkage for predicting responses in large-scale multivariate linear regression

Arxiv

0+阅读 · 2021年4月18日

Tutorial on logistic-regression calibration and fusion: Converting a score to a likelihood ratio

Arxiv

0+阅读 · 2021年4月18日

A Multiple Regression-Enhanced Convolution Estimator for the Density of a Response Variable in the Presence of Auxiliary Information

Arxiv

0+阅读 · 2021年4月16日

Introducing prior information in Weighted Likelihood Bootstrap with applications to model misspecification

Arxiv

0+阅读 · 2021年4月16日

Lazy OCO: Online Convex Optimization on a Switching Budget

Arxiv

0+阅读 · 2021年4月16日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

FastDraw: Addressing the Long Tail of Lane Detection by Adapting a Sequential Prediction Network

FastDraw: Addressing the Long Tail of Lane Detection by Adapting a Sequential Prediction Network

Arxiv

6+阅读 · 2019年5月14日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

上下文赌博机/上下文老虎机

赌博机/老虎机

相关VIP内容

【斯坦福CS224w图机器学习第6讲】图神经网络模型概述总结，67页ppt

【斯坦福CS224w图机器学习第6讲】图神经网络模型概述总结，67页ppt

专知会员服务

59+阅读 · 2021年1月31日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

人工智能如何用于抵抗COVID-19？Mila这份《AI against COVID-19 》PPT

专知会员服务

48+阅读 · 2020年5月17日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

专知会员服务

11+阅读 · 2019年11月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

随波逐流：Similarity-Adaptive and Discrete Optimization

随波逐流：Similarity-Adaptive and Discrete Optimization

我爱读PAMI

5+阅读 · 2018年2月6日

相关论文

Oracle Efficient Estimation of Structural Breaks in Cointegrating Regressions

Arxiv

0+阅读 · 2021年4月20日

Swap Dynamics in Single-Peaked Housing Markets

Arxiv

0+阅读 · 2021年4月19日

Linear shrinkage for predicting responses in large-scale multivariate linear regression

Arxiv

0+阅读 · 2021年4月18日

Tutorial on logistic-regression calibration and fusion: Converting a score to a likelihood ratio

Arxiv

0+阅读 · 2021年4月18日

A Multiple Regression-Enhanced Convolution Estimator for the Density of a Response Variable in the Presence of Auxiliary Information

Arxiv

0+阅读 · 2021年4月16日

Introducing prior information in Weighted Likelihood Bootstrap with applications to model misspecification

Arxiv

0+阅读 · 2021年4月16日

Lazy OCO: Online Convex Optimization on a Switching Budget

Arxiv

0+阅读 · 2021年4月16日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

FastDraw: Addressing the Long Tail of Lane Detection by Adapting a Sequential Prediction Network

FastDraw: Addressing the Long Tail of Lane Detection by Adapting a Sequential Prediction Network

Arxiv

6+阅读 · 2019年5月14日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员