The knowledge gradient (KG) algorithm is a popular and effective algorithm for the best arm identification (BAI) problem. Due to the complex calculation of KG, theoretical analysis of this algorithm is difficult, and existing results are mostly about the asymptotic performance of it, e.g., consistency, asymptotic sample allocation, etc. In this research, we present new theoretical results about the finite-time performance of the KG algorithm. Under independent and normally distributed rewards, we derive bounds for the sample allocation of the algorithm. With these bounds, existing asymptotic results become simple corollaries. Furthermore, we derive upper and lower bounds for the probability of error and simple regret of the algorithm, and show the performance of the algorithm for the multi-armed bandit (MAB) problem. These developments not only extend the existing analysis of the KG algorithm, but can also be used to analyze other improvement-based algorithms. Last, we use numerical experiments to compare the bounds we derive and the performance of the KG algorithm.
翻译:知识梯度( KG) 算法是用于最佳手臂识别( BAI) 问题的流行而有效的算法。 由于对 KG 的计算十分复杂, 对这一算法的理论分析很困难, 并且现有结果主要是关于它无症状的性能, 例如一致性、 无症状样本分配等。 在这个研究中, 我们介绍了关于 KG 算法的有限时间性能的新理论结果。 在独立和通常分布的奖励下, 我们得出算法样本分配的界限。 有了这些界限, 现有的无症状结果就变成了简单的卷轴。 此外, 我们从上到下得出了算法的错误概率和简单遗憾, 并展示了多臂带( MAB) 问题算法的性能。 这些发展不仅扩展了对 KG 算法的现有分析, 还可以用来分析其他基于改进的算法。 最后, 我们用数字实验来比较我们得出的界限和 KG 算法的性能 。