Lookahead, also known as non-myopic, Bayesian optimization (BO) aims to find optimal sampling policies through solving a dynamic program (DP) that maximizes a long-term reward over a rolling horizon. Though promising, lookahead BO faces the risk of error propagation through its increased dependence on a possibly mis-specified model. In this work we focus on the rollout approximation for solving the intractable DP. We first prove the improving nature of rollout in tackling lookahead BO and provide a sufficient condition for the used heuristic to be rollout improving. We then provide both a theoretical and practical guideline to decide on the rolling horizon stagewise. This guideline is built on quantifying the negative effect of a mis-specified model. To illustrate our idea, we provide case studies on both single and multi-information source BO. Empirical results show the advantageous properties of our method over several myopic and non-myopic BO algorithms.
翻译:Lookahead,又称Bayesian优化(BO),其宗旨是通过解决一个动态程序(DP),在滚动地平线上最大限度地获得长期奖励,找到最佳的抽样政策。虽然前景良好,但BO面临错误传播的风险,因为它更加依赖可能错误指定的模型。在这项工作中,我们侧重于解决棘手的DP的推出近似值。我们首先证明在解决BO外观时推出的改进性质,并为用过的超常性改进提供了充分的条件。我们随后提供了理论和实践指南,以决定滚动地平线阶段。这个指南建立在量化错误指定模型的负面影响上。为了说明我们的想法,我们提供了关于单一和多信息来源的案例研究。经验性结果显示我们方法对若干短视和非微博算法的优点。