最优化 $ $ $- 校正通用分发最佳武器选择 (Optimal $δ$-Correct Best-Arm Selection for General Distributions)

Given a finite set of unknown distributions, or arms, that can be sampled, we consider the problem of identifying the one with the largest mean using a delta-correct algorithm (an adaptive, sequential algorithm that restricts the probability of error to a specified delta) that has minimum sample complexity. Lower bounds for delta-correct algorithms are well known. Delta-correct algorithms that match the lower bound asymptotically as delta reduces to zero have been previously developed when arm distributions are restricted to a single parameter exponential family. In this paper, we first observe a negative result that some restrictions are essential, as otherwise under a delta-correct algorithm, distributions with unbounded support would require an infinite number of samples in expectation. We then propose a delta-correct algorithm that matches the lower bound as delta reduces to zero under the mild restriction that a known bound on the expectation of a non-negative, continuous, increasing convex function (for example, the squared moment) of the underlying random variables, exists. We also propose batch processing and identify near-optimal batch sizes to substantially speed up the proposed algorithm. The best-arm problem has many learning applications, including recommendation systems and product selection. It is also a well studied classic problem in the simulation community.

翻译：根据一组可以抽样的未知分布或手臂的有限范围,我们考虑了使用一个具有最低样本复杂性的三角校正算法(一个适应性、顺序算法,将误差概率限制在特定的三角洲中)确定一个最大平均值的分布或手臂的问题。三角校正算法的下边界限是众所周知的。当三角洲减少时,当手臂分布局限于一个单一参数指数式家族时,与较低约束线的无线分布相匹配的三角校正算法以前就已经发展为零。在本文中,我们首先看到一个负面结果,即一些限制是必要的,否则在三角校正算法下,在不受约束的支持下,分配将需要无限数量的样本。然后我们提出一个三角校正的三角校正算法,因为三角校正法的下边框将降低到零。已知的受非负性、连续的、不断增长的 convex函数(例如平坦的瞬间)所约束的三角校准。我们还提议分批处理和确定近于最优化的成批量大小,以大大加速地加速进行拟议的社区选择。我们建议系统也存在一个最优的问题。