线性土匪中的 Pareto 最佳最佳模式选择 (Pareto Optimal Model Selection in Linear Bandits) - 专知论文

会员服务 ·

0

模型选择 · 赌博机/老虎机 · 线性的 · 优化器 · MoDELS ·

2021 年 2 月 12 日

Pareto Optimal Model Selection in Linear Bandits

翻译：线性土匪中的 Pareto 最佳最佳模式选择

Yinglun Zhu,Robert Nowak

We study a model selection problem in the linear bandit setting, where the learner must adapt to the dimension of the optimal hypothesis class on the fly and balance exploration and exploitation. More specifically, we assume a sequence of nested linear hypothesis classes with dimensions $d_1 < d_2 < \dots$, and the goal is to automatically adapt to the smallest hypothesis class that contains the true linear model. Although previous papers provide various guarantees for this model selection problem, the analysis therein either works in favorable cases when one can cheaply conduct statistical testing to locate the right hypothesis class or is based on the idea of "corralling" multiple base algorithms which often performs relatively poorly in practice. These works also mainly focus on upper bounding the regret. In this paper, we first establish a lower bound showing that, even with a fixed action set, adaptation to the unknown intrinsic dimension $d_\star$ comes at a cost: there is no algorithm that can achieve the regret bound $\widetilde{O}(\sqrt{d_\star T})$ simultaneously for all values of $d_\star$. We also bring new ideas, i.e., constructing virtual mixture-arms to effectively summarize useful information, into the model selection problem in linear bandits. Under a mild assumption on the action set, we design a Pareto optimal algorithm with guarantees matching the rate in the lower bound. Experimental results confirm our theoretical results and show advantages of our algorithm compared to prior work.

翻译：我们研究了线性土匪设置中的模型选择问题, 学习者必须适应飞行和平衡勘探与开发的最佳假设等级的维度。更具体地说, 我们假设一系列嵌入的线性假设假设等级, 其规模为$_ 1 < d_ 2 < d\ dots$, 目标是自动适应包含真正线性模型的最小假设类别。虽然以前的论文为这个模型选择问题提供了各种保障, 但其中的分析要么在有利的情况下起作用, 当人们可以廉价地进行统计测试以定位正确的假设等级时, 或者基于在实践中往往表现较差的“ coralling” 多重基础算法的想法。这些计算方法还主要侧重于上层框框的遗憾。在本文中, 我们首先设定了一个较低的界限, 即使有固定的动作组合, 适应未知的内在维度 $d ⁇ star$*{O} (\\qqqrt{\\\\\\\\\\\\\\\\\\ starT}T} 这样的分析要么中, 也可以同时为美元的所有价值进行统计测试, 。我们还将新的理论级算算算算的模型, 在虚拟模型的模型中, 定义中, 我们的排序中, 将新的模型中, 定义的排序中, 将显示一个虚拟的模型的模型到一个模型的模型的模型的模型的模型的排序。

0

相关内容

模型选择

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

已删除

将门创投

8+阅读 · 2019年7月10日

Federated Bandit: A Gossiping Approach

Arxiv

0+阅读 · 2021年4月7日

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability

Arxiv

0+阅读 · 2021年4月6日

Memory-Constrained No-Regret Learning in Adversarial Bandits

Arxiv

0+阅读 · 2021年4月6日

Truly Optimal Euclidean Spanners

Arxiv

0+阅读 · 2021年4月5日

Robust Bandit Learning with Imperfect Context

Arxiv

0+阅读 · 2021年4月4日

D-optimal designs for the Mitscherlich non-linear regression function

Arxiv

0+阅读 · 2021年4月4日

Optimal Selection for Good Polynomials of Degree up to Five

Arxiv

0+阅读 · 2021年4月3日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

已删除

将门创投

8+阅读 · 2019年7月10日

相关论文

Federated Bandit: A Gossiping Approach

Arxiv

0+阅读 · 2021年4月7日

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability

Arxiv

0+阅读 · 2021年4月6日

Memory-Constrained No-Regret Learning in Adversarial Bandits

Arxiv

0+阅读 · 2021年4月6日

Truly Optimal Euclidean Spanners

Arxiv

0+阅读 · 2021年4月5日

Robust Bandit Learning with Imperfect Context

Arxiv

0+阅读 · 2021年4月4日

D-optimal designs for the Mitscherlich non-linear regression function

Arxiv

0+阅读 · 2021年4月4日

Optimal Selection for Good Polynomials of Degree up to Five

Arxiv

0+阅读 · 2021年4月3日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

微信扫码咨询专知VIP会员