适应因离线后退甲骨文造成的背景强盗的偏差 (Adapting to Misspecification in Contextual Bandits with Offline Regression Oracles) - 专知论文

会员服务 ·

0

上下文赌博机/上下文老虎机 · 赌博机/老虎机 · Oracle · 广义函数 · 稳健性 ·

2021 年 6 月 11 日

Adapting to Misspecification in Contextual Bandits with Offline Regression Oracles

翻译：适应因离线后退甲骨文造成的背景强盗的偏差

Sanath Kumar Krishnamurthy,Vitor Hadad,Susan Athey

from arxiv, ICML 2021

Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the bandit algorithm may incur unexpected regret, so recent work has focused on algorithms that are robust to misspecification. We propose a simple family of contextual bandit algorithms that adapt to misspecification error by reverting to a good safe policy when there is evidence that misspecification is causing a regret increase. Our algorithm requires only an offline regression oracle to ensure regret guarantees that gracefully degrade in terms of a measure of the average misspecification level. Compared to prior work, we attain similar regret guarantees, but we do no rely on a master algorithm, and do not require more robust oracles like online or constrained regression oracles (e.g., Foster et al. (2020a); Krishnamurthy et al. (2020)). This allows us to design algorithms for more general function approximation classes.

翻译：效率高的背景强盗往往基于利用过去的数据来估计一种预测模型,根据不同的背景和武器来估计奖赏。然而,当奖赏模型没有很好地指定时,土匪算法可能会引起意外的遗憾,因此最近的工作侧重于强于偏差的算法。我们建议建立一个简单的背景强盗算法组合,在有证据表明错误区分正在导致遗憾增加时,通过恢复到一个良好的安全政策来适应错误的区分错误。我们的算法只需要一个离线的回归或触角,以确保在衡量平均误差水平时出现优减的遗憾保证。与以前的工作相比,我们获得了类似的遗憾保证,但我们并不依赖主算法,而不需要像在线或受限制的回归或触法(例如,Foster等人(2020年a);Krishnamurthy 等人(202020年)那样的更坚固的手法或手法。这使我们能够为更普遍的功能近似等级设计算法。

0

相关内容

上下文赌博机/上下文老虎机

上下文赌博机/上下文老虎机

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日

【O'Reilly AI Conference 2019】实时AI实体解析，Real-time AI for entity resolution ，Senzing 的创始人兼首席执行官Jeff Jonas

【O'Reilly AI Conference 2019】实时AI实体解析，Real-time AI for entity resolution ，Senzing 的创始人兼首席执行官Jeff Jonas

专知会员服务

10+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

机器学习方法体系汇总

机器学习方法体系汇总

机器学习算法与Python学习

9+阅读 · 2017年8月12日

人工智能之机器学习算法体系汇总

人工智能之机器学习算法体系汇总

数据挖掘入门与实战

4+阅读 · 2017年8月9日

Learning-To-Ensemble by Contextual Rank Aggregation in E-Commerce

Arxiv

0+阅读 · 2021年8月10日

Provable Training Set Debugging for Linear Regression

Arxiv

0+阅读 · 2021年8月10日

IntenT5: Search Result Diversification using Causal Language Models

IntenT5: Search Result Diversification using Causal Language Models

Arxiv

0+阅读 · 2021年8月9日

Targeted Principal Components Regression

Arxiv

0+阅读 · 2021年8月9日

Imitation by Predicting Observations

Imitation by Predicting Observations

Arxiv

4+阅读 · 2021年7月8日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

Meta-Learning with Differentiable Convex Optimization

Arxiv

5+阅读 · 2019年4月23日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

VIP会员

文章信息

相关主题

上下文赌博机/上下文老虎机

赌博机/老虎机

相关VIP内容

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日

【O'Reilly AI Conference 2019】实时AI实体解析，Real-time AI for entity resolution ，Senzing 的创始人兼首席执行官Jeff Jonas

【O'Reilly AI Conference 2019】实时AI实体解析，Real-time AI for entity resolution ，Senzing 的创始人兼首席执行官Jeff Jonas

专知会员服务

10+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

数据要素发展报告(2025年)：附下载

人工智能代理提升战时舰船战备水平

【NeurIPS2025教程】大语言模型规划

NeurIPS 2025 教程：深度学习训练不稳定性的理论洞见

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

机器学习方法体系汇总

机器学习方法体系汇总

机器学习算法与Python学习

9+阅读 · 2017年8月12日

人工智能之机器学习算法体系汇总

人工智能之机器学习算法体系汇总

数据挖掘入门与实战

4+阅读 · 2017年8月9日

相关论文

Learning-To-Ensemble by Contextual Rank Aggregation in E-Commerce

Arxiv

0+阅读 · 2021年8月10日

Provable Training Set Debugging for Linear Regression

Arxiv

0+阅读 · 2021年8月10日

IntenT5: Search Result Diversification using Causal Language Models

IntenT5: Search Result Diversification using Causal Language Models

Arxiv

0+阅读 · 2021年8月9日

Targeted Principal Components Regression

Arxiv

0+阅读 · 2021年8月9日

Imitation by Predicting Observations

Imitation by Predicting Observations

Arxiv

4+阅读 · 2021年7月8日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

Meta-Learning with Differentiable Convex Optimization

Arxiv

5+阅读 · 2019年4月23日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

微信扫码咨询专知VIP会员