In Bayesian optimization (BO) for expensive black-box optimization tasks, acquisition function (AF) guides sequential sampling and plays a pivotal role for efficient convergence to better optima. Prevailing AFs usually rely on artificial experiences in terms of preferences for exploration or exploitation, which runs a risk of a computational waste or traps in local optima and resultant re-optimization. To address the crux, the idea of data-driven AF selection is proposed, and the sequential AF selection task is further formalized as a Markov decision process (MDP) and resort to powerful reinforcement learning (RL) technologies. Appropriate selection policy for AFs is learned from superior BO trajectories to balance between exploration and exploitation in real time, which is called reinforcement-learning-assisted Bayesian optimization (RLABO). Competitive and robust BO evaluations on five benchmark problems demonstrate RL's recognition of the implicit AF selection pattern and imply the proposal's potential practicality for intelligent AF selection as well as efficient optimization in expensive black-box problems.
翻译:在Bayesian优化(BO)中,用于昂贵的黑盒优化任务,获取功能(AF)指导连续抽样,并发挥关键作用,以有效汇合,更好地作出选择;在选择勘探或开发方面,主要依靠人为经验,这有可能在当地选择中造成计算性废物或陷阱,并因此产生再优化;为解决问题,提出了由数据驱动的FA选择概念,并进一步正式确定AF的顺序选择任务,作为Markov决定程序,并采用强大的强化学习技术;从高级BO球队中学习AF的适当选择政策,以在实时勘探和开发之间取得平衡,即所谓的强化学习辅助Bayesian优化(RLABO);对五个基准问题进行竞争和有力的BO评价,表明RL承认隐含的AF选择模式,并暗示建议对明智的AF选择的潜在实用性,以及在昂贵的黑盒问题上的高效优化。