通过离线甲骨怪在通过许多类的上下文强盗中最优化的模型选择 (Optimal Model Selection in Contextual Bandits with Many Classes via Offline Oracles) - 专知论文

会员服务 ·

0

模型选择 · 上下文赌博机/上下文老虎机 · Bandits · 赌博机/老虎机 · MoDELS ·

2021 年 6 月 11 日

Optimal Model Selection in Contextual Bandits with Many Classes via Offline Oracles

翻译：通过离线甲骨怪在通过许多类的上下文强盗中最优化的模型选择

Sanath Kumar Krishnamurthy,Susan Athey

We study the problem of model selection for contextual bandits, in which the algorithm must balance the bias-variance trade-off for model estimation while also balancing the exploration-exploitation trade-off. In this paper, we propose the first reduction of model selection in contextual bandits to offline model selection oracles, allowing for flexible general purpose algorithms with computational requirements no worse than those for model selection for regression. Our main result is a new model selection guarantee for stochastic contextual bandits. When one of the classes in our set is realizable, up to a logarithmic dependency on the number of classes, our algorithm attains optimal realizability-based regret bounds for that class under one of two conditions: if the time-horizon is large enough, or if an assumption that helps with detecting misspecification holds. Hence our algorithm adapts to the complexity of this unknown class. Even when this realizable class is known, we prove improved regret guarantees in early rounds by relying on simpler model classes for those rounds and hence further establish the importance of model selection in contextual bandits.

翻译：我们研究背景强盗的模型选择问题, 算法必须平衡模型估算的偏差取舍, 同时平衡勘探- 开发的取舍。在本文中, 我们建议首先将背景强盗的模型选择减为离线模式选择或触角, 允许采用灵活的通用算法, 其计算要求不比回归模式选择要求差。我们的主要结果是为随机强盗提供新的模型选择保障。当我们组中的一个班级可以实现时, 直至对班级数的对数依赖性, 我们的算法在两个条件下之一( 如果时间偏差足够大, 或者如果有助于发现误差的假设成立 ) 下, 首次将背景强盗的模型选择减少。因此, 我们的算法适应了这个未知班级的复杂程度。即使知道这个可实现的班级, 我们证明早期的遗憾保证得到了改进, 依靠这些班子的更简单的模型班级, 从而进一步确定背景强盗的模式选择的重要性。

0

相关内容

模型选择

机器学习简明导论，62页pdf

专知会员服务

83+阅读 · 2021年7月31日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Avoid Estimating the Unknown Function in a Semiparametric Nonignorable Propensity Model

Arxiv

0+阅读 · 2021年8月10日

Asymptotic convergence rates for averaging strategies

Arxiv

0+阅读 · 2021年8月10日

Bandits with Partially Observable Confounded Data

Bandits with Partially Observable Confounded Data

Arxiv

0+阅读 · 2021年8月10日

On a Competitive Selection Problem

Arxiv

0+阅读 · 2021年8月10日

Online Multiobjective Minimax Optimization and Applications

Arxiv

0+阅读 · 2021年8月9日

Online Resource Allocation with Time-Flexible Customers

Arxiv

0+阅读 · 2021年8月7日

Meta-strategy for Learning Tuning Parameters with Guarantees

Arxiv

0+阅读 · 2021年8月6日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Automatic multi-objective based feature selection for classification

Automatic multi-objective based feature selection for classification

Arxiv

6+阅读 · 2018年7月9日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

VIP会员

文章信息

相关主题

上下文赌博机/上下文老虎机

赌博机/老虎机

相关VIP内容

机器学习简明导论，62页pdf

专知会员服务

83+阅读 · 2021年7月31日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Avoid Estimating the Unknown Function in a Semiparametric Nonignorable Propensity Model

Arxiv

0+阅读 · 2021年8月10日

Asymptotic convergence rates for averaging strategies

Arxiv

0+阅读 · 2021年8月10日

Bandits with Partially Observable Confounded Data

Bandits with Partially Observable Confounded Data

Arxiv

0+阅读 · 2021年8月10日

On a Competitive Selection Problem

Arxiv

0+阅读 · 2021年8月10日

Online Multiobjective Minimax Optimization and Applications

Arxiv

0+阅读 · 2021年8月9日

Online Resource Allocation with Time-Flexible Customers

Arxiv

0+阅读 · 2021年8月7日

Meta-strategy for Learning Tuning Parameters with Guarantees

Arxiv

0+阅读 · 2021年8月6日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Automatic multi-objective based feature selection for classification

Automatic multi-objective based feature selection for classification

Arxiv

6+阅读 · 2018年7月9日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

微信扫码咨询专知VIP会员