Knowledge gradient is a design principle for developing Bayesian sequential sampling policies to solve optimization problems. In this paper we consider the ranking and selection problem in the presence of covariates, where the best alternative is not universal but depends on the covariates. In this context, we prove that under minimal assumptions, the sampling policy based on knowledge gradient is consistent, in the sense that following the policy the best alternative as a function of the covariates will be identified almost surely as the number of samples grows. We also propose a stochastic gradient ascent algorithm for computing the sampling policy and demonstrate its performance via numerical experiments.
翻译:知识梯度是制定贝叶斯相继抽样政策以解决优化问题的设计原则。在本文件中,我们考虑的是同级差的排名和选择问题,其中最佳的替代方法不是普遍性的,而是取决于同级差。在这方面,我们证明,在最低假设下,以知识梯度为基础的抽样政策是一致的,也就是说,按照该政策,随着样品数量的增加,作为同级差函数的最佳替代方法将几乎肯定地被确定为最佳的。我们还提出一种随机梯度算法,用于计算抽样政策并通过数字实验来显示其表现。