大陆上随身武装土匪的无尺寸定界法 (A Dimension-free Algorithm for Contextual Continuum-armed Bandits) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 维数灾难 · 泛函 · Continuity · INFORMS ·

2022 年 10 月 3 日

A Dimension-free Algorithm for Contextual Continuum-armed Bandits

翻译：大陆上随身武装土匪的无尺寸定界法

Wenhao Li,Ningyuan Chen,L. Jeff Hong

In contextual continuum-armed bandits, the contexts $x$ and the arms $y$ are both continuous and drawn from high-dimensional spaces. The payoff function to learn $f(x,y)$ does not have a particular parametric form. The literature has shown that for Lipschitz-continuous functions, the optimal regret is $\tilde{O}(T^{\frac{d_x+d_y+1}{d_x+d_y+2}})$, where $d_x$ and $d_y$ are the dimensions of contexts and arms, and thus suffers from the curse of dimensionality. We develop an algorithm that achieves regret $\tilde{O}(T^{\frac{d_x+1}{d_x+2}})$ when $f$ is globally concave in $y$. The global concavity is a common assumption in many applications. The algorithm is based on stochastic approximation and estimates the gradient information in an online fashion. Our results generate a valuable insight that the curse of dimensionality of the arms can be overcome with some mild structures of the payoff function.

翻译：在连续武装的土匪中,背景值x美元和军火美元都是连续的,并且是从高维空间中提取的。学习(x,y)美元的报酬功能没有特定的参数形式。文献显示,对于利普西茨-连续功能而言,最佳的遗憾是$\tilde{O}(T ⁇ frac{d_x+d_y+1 ⁇ d_x+d_d_y+2 ⁇ )美元,美元和美元是背景和武器方位的维度,因此受维度的诅咒影响。我们开发了一种算法,当美元是全球性的,以美元为单位时,则会以美元为单位。全球混凝土是许多应用中常见的假设。算法基于随机近似和估算,并以在线方式估算梯度信息。我们的结果产生了宝贵的洞察力,即武器维度的诅咒可以通过某种温和的支付功能结构来克服。

0

相关内容

赌博机/老虎机

赌博机/老虎机

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

中国蓝莓枝枯病的病原种类及分子检测研究

国家自然科学基金

0+阅读 · 2013年12月31日

农村地区2型糖尿病Markov模型构建及相关干预策略经济学评价

国家自然科学基金

0+阅读 · 2013年12月31日

利用连锁分析和关联分析发掘小麦抗旱优异等位基因

国家自然科学基金

0+阅读 · 2012年12月31日

循环甲基化抑癌基因定量用于NSCLC化疗疗效评估的意义

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

Optimistic No-regret Algorithms for Discrete Caching

Arxiv

0+阅读 · 2022年11月9日

A Simple Algorithm for Online Decision Making

Arxiv

0+阅读 · 2022年11月8日

Adaptive Data Depth via Multi-Armed Bandits

Arxiv

0+阅读 · 2022年11月8日

Beyond time-homogeneity for continuous-time multistate Markov models

Arxiv

0+阅读 · 2022年11月6日

Online Learning and Bandits with Queried Hints

Arxiv

0+阅读 · 2022年11月4日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

星链与未来战争

《黑蜂（Black Hummingbird）微型无人机》

《全球地缘政治环境中的反无人机系统互操作性》252页

《美国：为自动驾驶汽车铺平道路——未来出行已来》最新43页报告

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

相关论文

Optimistic No-regret Algorithms for Discrete Caching

Arxiv

0+阅读 · 2022年11月9日

A Simple Algorithm for Online Decision Making

Arxiv

0+阅读 · 2022年11月8日

Adaptive Data Depth via Multi-Armed Bandits

Arxiv

0+阅读 · 2022年11月8日

Beyond time-homogeneity for continuous-time multistate Markov models

Arxiv

0+阅读 · 2022年11月6日

Online Learning and Bandits with Queried Hints

Arxiv

0+阅读 · 2022年11月4日

相关基金

中国蓝莓枝枯病的病原种类及分子检测研究

国家自然科学基金

0+阅读 · 2013年12月31日

农村地区2型糖尿病Markov模型构建及相关干预策略经济学评价

国家自然科学基金

0+阅读 · 2013年12月31日

利用连锁分析和关联分析发掘小麦抗旱优异等位基因

国家自然科学基金

0+阅读 · 2012年12月31日

循环甲基化抑癌基因定量用于NSCLC化疗疗效评估的意义

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员