We introduce vector optimization problems with stochastic bandit feedback, in which preferences among designs are encoded by a polyhedral ordering cone $C$. Our setup generalizes the best arm identification problem to vector-valued rewards by extending the concept of Pareto set beyond multi-objective optimization. We characterize the sample complexity of ($\epsilon,\delta$)-PAC Pareto set identification by defining a new cone-dependent notion of complexity, called the ordering complexity. In particular, we provide gap-dependent and worst-case lower bounds on the sample complexity and show that, in the worst-case, the sample complexity scales with the square of ordering complexity. Furthermore, we investigate the sample complexity of the na\"ive elimination algorithm and prove that it nearly matches the worst-case sample complexity. Finally, we run experiments to verify our theoretical results and illustrate how $C$ and sampling budget affect the Pareto set, the returned ($\epsilon,\delta$)-PAC Pareto set, and the success of identification.
翻译:我们引入了矢量优化问题, 使用Stochaltic bandit 反馈, 设计中的偏好由多角度订单编码, 以C$为单位。 我们的设置通过扩展Pareto 设定的概念, 将最佳的手臂识别问题推广到矢量估值奖励中, 将Pareto 设定的概念推广到多目标优化之外。 我们用定义新的、 共性的复杂度概念来描述( $\ epsilon,\delta$)- PAC Pareto 设定的样本复杂性, 称为 订单复杂性。 特别是, 我们提供样本复杂度的偏差和最差的下限, 并显示在最坏的情况下, 样本复杂度与定序的正方形。 此外, 我们调查天正消除算的样本复杂性, 并证明它几乎与最差的样本复杂性相匹配 。 最后, 我们进行实验, 以验证我们的理论结果和抽样预算如何影响 Pareto 集、 返回的 $\\ delta$,\ d)- PAC Paretoto 以及鉴定的成功 。</s>