We design and analyze VA-LUCB, a parameter-free algorithm, for identifying the best arm under the fixed-confidence setup and under a stringent constraint that the variance of the chosen arm is strictly smaller than a given threshold. An upper bound on VA-LUCB's sample complexity is shown to be characterized by a fundamental variance-aware hardness quantity $H_{VA}$. By proving a lower bound, we show that sample complexity of VA-LUCB is optimal up to a factor logarithmic in $H_{VA}$. Extensive experiments corroborate the dependence of the sample complexity on the various terms in $H_{VA}$. By comparing VA-LUCB's empirical performance to a close competitor RiskAverse-UCB-BAI by David et al. (2018), our experiments suggest that VA-LUCB has the lowest sample complexity for this class of risk-constrained best arm identification problems, especially for the riskiest instances.
翻译:我们设计并分析VA-LUCB这一无参数算法,以确定固定信心设置下的最佳手臂,并在严格限制下确定所选手臂的差异严格地小于某一阈值的严格限制下,VA-LUCB样本复杂性的上限特征是基本差分硬度($H ⁇ VA-LUCB) $(H ⁇ VA-LUCB) 。通过证明一个较低的约束值,我们发现VA-LUCB样本复杂性最优于一个系数对数($H ⁇ VA) 。广泛的实验证实了样本复杂性对不同条件($H ⁇ VA-LUCB)的依赖性。通过将David等人(2018年)的VA-LUCB的经验性表现与密切竞争者风险Averse-UCB-BI(I)的经验性表现相比较,我们的实验表明VA-LUCB的样本复杂性是这一类受风险限制的最佳手臂识别问题中最低的样本复杂性,特别是风险最严重的案例。