诺姆-不可知线性线性强盗 (Norm-Agnostic Linear Bandits) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 知识 (knowledge) · 线性的 · 情景 · ARM ·

2022 年 5 月 3 日

Norm-Agnostic Linear Bandits

翻译：诺姆-不可知线性线性强盗

Spencer, Gales,Sunder Sethuraman,Kwang-Sung Jun

from arxiv, AISTATS'22; added acknowledgements

Linear bandits have a wide variety of applications including recommendation systems yet they make one strong assumption: the algorithms must know an upper bound $S$ on the norm of the unknown parameter $\theta^*$ that governs the reward generation. Such an assumption forces the practitioner to guess $S$ involved in the confidence bound, leaving no choice but to wish that $\|\theta^*\|\le S$ is true to guarantee that the regret will be low. In this paper, we propose novel algorithms that do not require such knowledge for the first time. Specifically, we propose two algorithms and analyze their regret bounds: one for the changing arm set setting and the other for the fixed arm set setting. Our regret bound for the former shows that the price of not knowing $S$ does not affect the leading term in the regret bound and inflates only the lower order term. For the latter, we do not pay any price in the regret for now knowing $S$. Our numerical experiments show standard algorithms assuming knowledge of $S$ can fail catastrophically when $\|\theta^*\|\le S$ is not true whereas our algorithms enjoy low regret.

翻译：线性土匪有各种各样的应用,包括建议系统,但他们却做出了一个强有力的假设:算法必须知道一个上限值$S美元,这是指导奖励产生过程的未知参数 $\theta ⁇ $的规范。这样的假设迫使执业者猜测信任约束下的美元,留下别无选择,只能希望$theta ⁇ le S$是真实的,以保证遗憾会很低。在本文中,我们提出了首次不需要这种知识的新型算法。具体地说,我们提出两个算法并分析他们的遗憾界限:一个用于改变手臂设置,另一个用于固定手臂设置。我们对前者的遗憾表明,不知道美元的代价不会影响遗憾约束中的主要时期,而只是夸大了较低的顺序时期。对于后者,我们并不为现在知道美元而感到遗憾而付出任何代价。我们的数字实验显示标准算法假设知道$S$是灾难性的,当$the ⁇ le S$是不真实的,而我们的算法则不那么后悔。

0

相关内容

赌博机/老虎机

赌博机/老虎机

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

紫薯糖基化修饰酶Ib3GGT对花青素修饰和富集的研究

国家自然科学基金

0+阅读 · 2015年12月31日

剪接因子SFRS5的泛素化和乙酰化修饰调控研究

国家自然科学基金

0+阅读 · 2014年12月31日

神经元离子通道-动作电位-量子化分泌关系研究

国家自然科学基金

0+阅读 · 2013年12月31日

化痰通脉饮对PCOS的IRS-1-PI3K/AKT/NF-κB串流失控的调节效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

趋化因子受体CXCR4和“同胞”CXCR7在缺氧致视网膜新生血管中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

POMC神经元在回肠转位术改善非肥胖2型糖尿病中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Hippo通路在急性肾损伤发病中的作用及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

MCD病理进程中FLT4调控的microRNA鉴定及其功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

GRK调控神经元树突发育的机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Shifted Compression Framework: Generalizations and Improvements

Arxiv

0+阅读 · 2022年6月21日

The Power of Regularization in Solving Extensive-Form Games

Arxiv

0+阅读 · 2022年6月19日

Thresholded Lasso Bandit

Arxiv

0+阅读 · 2022年6月19日

Arxiv

0+阅读 · 2022年6月19日

Differentially Private Multi-Party Data Release for Linear Regression

Arxiv

0+阅读 · 2022年6月18日

Learning a Single Neuron with Adversarial Label Noise via Gradient Descent

Learning a Single Neuron with Adversarial Label Noise via Gradient Descent

Arxiv

0+阅读 · 2022年6月17日

Residual Bootstrap Exploration for Stochastic Linear Bandit

Residual Bootstrap Exploration for Stochastic Linear Bandit

Arxiv

0+阅读 · 2022年6月17日

Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms

Arxiv

0+阅读 · 2022年6月17日

Tensor-on-Tensor Regression: Riemannian Optimization, Over-parameterization, Statistical-computational Gap, and Their Interplay

Tensor-on-Tensor Regression: Riemannian Optimization, Over-parameterization, Statistical-computational Gap, and Their Interplay

Arxiv

0+阅读 · 2022年6月17日

Thompson Sampling for Robust Transfer in Multi-Task Bandits

Arxiv

0+阅读 · 2022年6月17日

VIP会员

文章信息

相关主题

赌博机/老虎机

知识 (knowledge)

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

未来战场：AI赋能无人作战新范式，39页ppt

【牛津博士论文】无限维空间中的广义变分推断

DeepSeek AI 从入门到付费专家·第一卷：动手实践、真实应用与可扩展 AI 解决方案全掌握

2025中国AI Agent商业应用场景洞察研究

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Shifted Compression Framework: Generalizations and Improvements

Arxiv

0+阅读 · 2022年6月21日

The Power of Regularization in Solving Extensive-Form Games

Arxiv

0+阅读 · 2022年6月19日

Thresholded Lasso Bandit

Arxiv

0+阅读 · 2022年6月19日

Arxiv

0+阅读 · 2022年6月19日

Differentially Private Multi-Party Data Release for Linear Regression

Arxiv

0+阅读 · 2022年6月18日

Learning a Single Neuron with Adversarial Label Noise via Gradient Descent

Learning a Single Neuron with Adversarial Label Noise via Gradient Descent

Arxiv

0+阅读 · 2022年6月17日

Residual Bootstrap Exploration for Stochastic Linear Bandit

Residual Bootstrap Exploration for Stochastic Linear Bandit

Arxiv

0+阅读 · 2022年6月17日

Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms

Arxiv

0+阅读 · 2022年6月17日

Tensor-on-Tensor Regression: Riemannian Optimization, Over-parameterization, Statistical-computational Gap, and Their Interplay

Tensor-on-Tensor Regression: Riemannian Optimization, Over-parameterization, Statistical-computational Gap, and Their Interplay

Arxiv

0+阅读 · 2022年6月17日

Thompson Sampling for Robust Transfer in Multi-Task Bandits

Arxiv

0+阅读 · 2022年6月17日

相关基金

紫薯糖基化修饰酶Ib3GGT对花青素修饰和富集的研究

国家自然科学基金

0+阅读 · 2015年12月31日

剪接因子SFRS5的泛素化和乙酰化修饰调控研究

国家自然科学基金

0+阅读 · 2014年12月31日

神经元离子通道-动作电位-量子化分泌关系研究

国家自然科学基金

0+阅读 · 2013年12月31日

化痰通脉饮对PCOS的IRS-1-PI3K/AKT/NF-κB串流失控的调节效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

趋化因子受体CXCR4和“同胞”CXCR7在缺氧致视网膜新生血管中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

POMC神经元在回肠转位术改善非肥胖2型糖尿病中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Hippo通路在急性肾损伤发病中的作用及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

MCD病理进程中FLT4调控的microRNA鉴定及其功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

GRK调控神经元树突发育的机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员