知识梯度算法在有限时间性表现 (On the Finite-Time Performance of the Knowledge Gradient Algorithm)

The knowledge gradient (KG) algorithm is a popular and effective algorithm for the best arm identification (BAI) problem. Due to the complex calculation of KG, theoretical analysis of this algorithm is difficult, and existing results are mostly about the asymptotic performance of it, e.g., consistency, asymptotic sample allocation, etc. In this research, we present new theoretical results about the finite-time performance of the KG algorithm. Under independent and normally distributed rewards, we derive lower bounds and upper bounds for the probability of error and simple regret of the algorithm. With these bounds, existing asymptotic results become simple corollaries. We also show the performance of the algorithm for the multi-armed bandit (MAB) problem. These developments not only extend the existing analysis of the KG algorithm, but can also be used to analyze other improvement-based algorithms. Last, we use numerical experiments to further demonstrate the finite-time behavior of the KG algorithm.

翻译：知识梯度( KG) 算法是用于最佳手臂识别( BAI) 问题的流行而有效的算法。由于对 KG 的计算十分复杂, 对这一算法的理论分析很困难, 现有结果主要是关于它的无症状性能, 例如一致性、无症状样本分配等。在这个研究中, 我们介绍了关于 KG 算法的有限时间性能的新的理论结果。在独立和通常分配的奖励下, 我们得出了错误概率和简单遗憾的下限和上限。随着这些界限, 现有的无症状结果成为简单的卷轴。我们还展示了多臂强盗问题算法的性能。这些发展不仅扩展了对 KG 算法的现有分析, 还可以用来分析其他基于改进的算法。最后, 我们用数字实验来进一步证明 KG 算法的有限时间行为。