矢量优化与斯托查式土匪反馈 (Vector Optimization with Stochastic Bandit Feedback) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 向量化 · 情景 · 优化器 · 均值 ·

2022 年 1 月 30 日

Vector Optimization with Stochastic Bandit Feedback

翻译：矢量优化与斯托查式土匪反馈

Çağın Ararat,Cem Tekin

from arxiv, 18 pages, 3 tables, 2 figures

We introduce vector optimization problems with stochastic bandit feedback, which extends the best arm identification problem to vector-valued rewards. We consider $K$ designs, with multi-dimensional mean reward vectors, which are partially ordered according to a polyhedral ordering cone $C$. This generalizes the concept of Pareto set in multi-objective optimization and allows different sets of preferences of decision-makers to be encoded by $C$. Different than prior work, we define approximations of the Pareto set based on direction-free covering and gap notions. We study the setting where an evaluation of each design yields a noisy observation of the mean reward vector. Under subgaussian noise assumption, we investigate the sample complexity of the na\"ive elimination algorithm in an ($\epsilon,\delta$)-PAC setting, where the goal is to identify an ($\epsilon,\delta$)-PAC Pareto set with the minimum number of design evaluations. In order to characterize the difficulty of learning the Pareto set, we introduce the concept of ordering complexity, i.e., geometric conditions on the deviations of empirical reward vectors from their mean under which the Pareto front can be approximated accurately. We show how to compute the ordering complexity of any polyhedral ordering cone. We run experiments to verify our theoretical results and illustrate how $C$ and sampling budget affect the Pareto set, returned ($\epsilon,\delta$)-PAC Pareto set and the success of identification.

翻译：我们引入了矢量优化问题, 将最好的手臂识别问题扩大到矢量价值的奖赏。我们考虑K$的设计, 其多维平均奖赏矢量的设计, 这些设计是根据多面性订单单价C$部分订购的。这概括了多目标优化中设定的帕雷托概念, 允许决策者的不同偏好用美元编码。与先前的工作不同, 我们根据无方向覆盖和差距的概念来定义帕雷托设定的近似值。我们研究每个设计评价的设置, 产生对平均奖赏矢量的响亮观测。根据亚库西语噪音假设, 我们调查一个多面性消除算法的样本复杂性( $\ epsilon,\delta$)- PAC 设置, 目标是确定一个(\ eplon,\delta$)- PAC Pareto 设置的最小设计评价次数。为了描述学习 Pareto 设置的难度, 我们引入了以下概念: 订购复杂度, i.e. e. i.

0

相关内容

赌博机/老虎机

赌博机/老虎机

【书籍】优化与编程：线性、非线性、动态、随机和Matlab应用，Optimizations and Programming: Linear, Nonlinear, Dynamic, Stochastic and Applications with Matlab

【书籍】优化与编程：线性、非线性、动态、随机和Matlab应用，Optimizations and Programming: Linear, Nonlinear, Dynamic, Stochastic and Applications with Matlab

专知会员服务

28+阅读 · 2022年4月8日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

59+阅读 · 2020年11月21日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

161+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

全局分片线性优化方法与应用

国家自然科学基金

0+阅读 · 2014年12月31日

矩阵方程秩约束广义最佳逼近理论及应用

国家自然科学基金

1+阅读 · 2013年12月31日

电力企业发电调度与燃料库存管理集成优化研究

国家自然科学基金

3+阅读 · 2013年12月31日

随机与动态环境下物流配送区域划分与配送路径集成优化问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

罗非鱼产业技术效率与市场竞争力研究

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

核函数优化选择的关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

图的若干参数及算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

可违约资产组合市场风险和信用风险集成度量模型及数值方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

收益管理中的排序理论及算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

Arxiv

0+阅读 · 2022年4月20日

A Novel Fast Exact Subproblem Solver for Stochastic Quasi-Newton Cubic Regularized Optimization

Arxiv

0+阅读 · 2022年4月19日

An averaged space-time discretization of the stochastic $p$-Laplace system

Arxiv

0+阅读 · 2022年4月19日

Optimal bounds for numerical approximations of infinite horizon problems based on dynamic programming approach

Arxiv

1+阅读 · 2022年4月19日

Stochastic Saddle Point Problems with Decision-Dependent Distributions

Arxiv

0+阅读 · 2022年4月19日

Safe rules for the identification of zeros in the solutions of the SLOPE problem

Arxiv

0+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

A Deep Learning Galerkin Method for the Closed-Loop Geothermal System

Arxiv

0+阅读 · 2022年4月18日

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Arxiv

0+阅读 · 2022年4月16日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【书籍】优化与编程：线性、非线性、动态、随机和Matlab应用，Optimizations and Programming: Linear, Nonlinear, Dynamic, Stochastic and Applications with Matlab

【书籍】优化与编程：线性、非线性、动态、随机和Matlab应用，Optimizations and Programming: Linear, Nonlinear, Dynamic, Stochastic and Applications with Matlab

专知会员服务

28+阅读 · 2022年4月8日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

59+阅读 · 2020年11月21日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

161+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《运用大语言模型支持空天防御系统工程项目》2025最新208页

《美空军转型：打造分布式空战力量以应对大国竞争》2025最新报告

消耗性无人机：认识战争演变中的技术特性与本质特征

《人体状态多模态推断·美陆军报告：风险环境下的认知追踪研究》2025最新100页

相关资讯

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

Arxiv

0+阅读 · 2022年4月20日

A Novel Fast Exact Subproblem Solver for Stochastic Quasi-Newton Cubic Regularized Optimization

Arxiv

0+阅读 · 2022年4月19日

An averaged space-time discretization of the stochastic $p$-Laplace system

Arxiv

0+阅读 · 2022年4月19日

Optimal bounds for numerical approximations of infinite horizon problems based on dynamic programming approach

Arxiv

1+阅读 · 2022年4月19日

Stochastic Saddle Point Problems with Decision-Dependent Distributions

Arxiv

0+阅读 · 2022年4月19日

Safe rules for the identification of zeros in the solutions of the SLOPE problem

Arxiv

0+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

A Deep Learning Galerkin Method for the Closed-Loop Geothermal System

Arxiv

0+阅读 · 2022年4月18日

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Arxiv

0+阅读 · 2022年4月16日

相关基金

全局分片线性优化方法与应用

国家自然科学基金

0+阅读 · 2014年12月31日

矩阵方程秩约束广义最佳逼近理论及应用

国家自然科学基金

1+阅读 · 2013年12月31日

电力企业发电调度与燃料库存管理集成优化研究

国家自然科学基金

3+阅读 · 2013年12月31日

随机与动态环境下物流配送区域划分与配送路径集成优化问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

罗非鱼产业技术效率与市场竞争力研究

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

核函数优化选择的关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

图的若干参数及算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

可违约资产组合市场风险和信用风险集成度量模型及数值方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

收益管理中的排序理论及算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员