适应上下文强盗的偏差 (Adapting to Misspecification in Contextual Bandits) - 专知论文

会员服务 ·

0

上下文赌博机/上下文老虎机 · 赌博机/老虎机 · 优化器 · MoDELS · 平方损失 ·

2021 年 7 月 12 日

Adapting to Misspecification in Contextual Bandits

翻译：适应上下文强盗的偏差

Dylan J. Foster,Claudio Gentile,Mehryar Mohri,Julian Zimmert

from arxiv, Appeared at NeurIPS 2020

A major research direction in contextual bandits is to develop algorithms that are computationally efficient, yet support flexible, general-purpose function approximation. Algorithms based on modeling rewards have shown strong empirical performance, but typically require a well-specified model, and can fail when this assumption does not hold. Can we design algorithms that are efficient and flexible, yet degrade gracefully in the face of model misspecification? We introduce a new family of oracle-efficient algorithms for $\varepsilon$-misspecified contextual bandits that adapt to unknown model misspecification -- both for finite and infinite action settings. Given access to an online oracle for square loss regression, our algorithm attains optimal regret and -- in particular -- optimal dependence on the misspecification level, with no prior knowledge. Specializing to linear contextual bandits with infinite actions in $d$ dimensions, we obtain the first algorithm that achieves the optimal $O(d\sqrt{T} + \varepsilon\sqrt{d}T)$ regret bound for unknown misspecification level $\varepsilon$. On a conceptual level, our results are enabled by a new optimization-based perspective on the regression oracle reduction framework of Foster and Rakhlin, which we anticipate will find broader use.

翻译：环境强盗的主要研究方向是开发计算效率高的算法,但支持灵活、通用功能近似。基于模型奖励的算法已经表现出很强的经验性表现,但通常需要精确的模型,如果这一假设不成立,就会失败。我们能否设计高效和灵活的算法,但面对模型的偏差而优雅地降解?我们为美元和瓦列普西隆特特特奇特特土匪引入一个新的算法组合,以适应未知的模型偏差 -- -- 无论是有限的还是无限的行动设置。鉴于在平方损失回归方面可以访问在线或触角,我们的算法会取得最佳的遗憾,特别是最佳地依赖偏差的定位水平,而事先没有这方面的知识。我们能否设计出一个高效和灵活的算法,但面对模型的偏差,我们获得了第一个实现最佳的 $( dqqrt{T} + varepsilon\ sqrt{t}T) 奇特的算法组合,对于未知的误差分化水平 $\ vareplon 或无限的动作设置。在概念上,我们通过一个更广义的递化框架,我们的结果将会通过一个新的递减变后得到实现。

0

相关内容

上下文赌博机/上下文老虎机

上下文赌博机/上下文老虎机

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

专知会员服务

41+阅读 · 2020年7月23日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【O'Reilly AI Conference 2019】高管简报：从落后者到领导者-赢得AI竞赛（Executive Briefing: From laggard to leader—Winning the AI race），Anastasia Kouvela , Bharath Thota

【O'Reilly AI Conference 2019】高管简报：从落后者到领导者-赢得AI竞赛（Executive Briefing: From laggard to leader—Winning the AI race），Anastasia Kouvela , Bharath Thota

专知会员服务

8+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

已删除

将门创投

4+阅读 · 2018年1月19日

Distribution-free Contextual Dynamic Pricing

Arxiv

0+阅读 · 2021年9月15日

Sequential prediction under log-loss and misspecification

Arxiv

0+阅读 · 2021年9月15日

Choosing the Right Algorithm With Hints From Complexity Theory

Arxiv

0+阅读 · 2021年9月14日

Optimal pointwise sampling for $L^2$ approximation

Arxiv

0+阅读 · 2021年9月13日

Improved Analysis of the Tsallis-INF Algorithm in Stochastically Constrained Adversarial Bandits and Stochastic Bandits with Adversarial Corruptions

Arxiv

0+阅读 · 2021年9月13日

Improved Algorithms for Misspecified Linear Markov Decision Processes

Arxiv

0+阅读 · 2021年9月12日

Fairness of Exposure in Stochastic Bandits

Arxiv

0+阅读 · 2021年9月12日

Lenient Regret for Multi-Armed Bandits

Arxiv

0+阅读 · 2021年9月12日

Best-Arm Identification in Correlated Multi-Armed Bandits

Arxiv

0+阅读 · 2021年9月10日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

上下文赌博机/上下文老虎机

赌博机/老虎机

相关VIP内容

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

专知会员服务

41+阅读 · 2020年7月23日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【O'Reilly AI Conference 2019】高管简报：从落后者到领导者-赢得AI竞赛（Executive Briefing: From laggard to leader—Winning the AI race），Anastasia Kouvela , Bharath Thota

【O'Reilly AI Conference 2019】高管简报：从落后者到领导者-赢得AI竞赛（Executive Briefing: From laggard to leader—Winning the AI race），Anastasia Kouvela , Bharath Thota

专知会员服务

8+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

已删除

将门创投

4+阅读 · 2018年1月19日

相关论文

Distribution-free Contextual Dynamic Pricing

Arxiv

0+阅读 · 2021年9月15日

Sequential prediction under log-loss and misspecification

Arxiv

0+阅读 · 2021年9月15日

Choosing the Right Algorithm With Hints From Complexity Theory

Arxiv

0+阅读 · 2021年9月14日

Optimal pointwise sampling for $L^2$ approximation

Arxiv

0+阅读 · 2021年9月13日

Improved Analysis of the Tsallis-INF Algorithm in Stochastically Constrained Adversarial Bandits and Stochastic Bandits with Adversarial Corruptions

Arxiv

0+阅读 · 2021年9月13日

Improved Algorithms for Misspecified Linear Markov Decision Processes

Arxiv

0+阅读 · 2021年9月12日

Fairness of Exposure in Stochastic Bandits

Arxiv

0+阅读 · 2021年9月12日

Lenient Regret for Multi-Armed Bandits

Arxiv

0+阅读 · 2021年9月12日

Best-Arm Identification in Correlated Multi-Armed Bandits

Arxiv

0+阅读 · 2021年9月10日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

微信扫码咨询专知VIP会员