为巴耶斯最佳优化而强化的少数热获取功能学习 (Reinforced Few-Shot Acquisition Function Learning for Bayesian Optimization)

Bayesian optimization (BO) conventionally relies on handcrafted acquisition functions (AFs) to sequentially determine the sample points. However, it has been widely observed in practice that the best-performing AF in terms of regret can vary significantly under different types of black-box functions. It has remained a challenge to design one AF that can attain the best performance over a wide variety of black-box functions. This paper aims to attack this challenge through the perspective of reinforced few-shot AF learning (FSAF). Specifically, we first connect the notion of AFs with Q-functions and view a deep Q-network (DQN) as a surrogate differentiable AF. While it serves as a natural idea to combine DQN and an existing few-shot learning method, we identify that such a direct combination does not perform well due to severe overfitting, which is particularly critical in BO due to the need of a versatile sampling policy. To address this, we present a Bayesian variant of DQN with the following three features: (i) It learns a distribution of Q-networks as AFs based on the Kullback-Leibler regularization framework. This inherently provides the uncertainty required in sampling for BO and mitigates overfitting. (ii) For the prior of the Bayesian DQN, we propose to use a demo policy induced by an off-the-shelf AF for better training stability. (iii) On the meta-level, we leverage the meta-loss of Bayesian model-agnostic meta-learning, which serves as a natural companion to the proposed FSAF. Moreover, with the proper design of the Q-networks, FSAF is general-purpose in that it is agnostic to the dimension and the cardinality of the input domain. Through extensive experiments, we demonstrate that the FSAF achieves comparable or better regrets than the state-of-the-art benchmarks on a wide variety of synthetic and real-world test functions.

翻译：Bayesian 优化( BO) 常规上依赖于手动获取功能( AF), 以顺序决定抽样点。然而, 在实践中,人们广泛观察到, 在不同类型的黑盒功能下, 表现最佳的FA( 遗憾程度) 可能会有很大差异。设计出一个能够在各种黑盒功能中取得最佳表现的AF( BO) 。本文的目的是从强化的少见的FFA( FSAF) 学习的角度来应对这一挑战。具体地说, 我们首先将AF( AF) 的概念与Q( DQN) 的功能联系起来, 并且将Qnet( DQN) 的深度网( DQN) 视为一种可替代的AF。虽然将DQ( DQ) 和现有的微小的学习方法结合起来是一种自然概念。我们发现,这种直接组合并不会很好地发挥功能, 这在BO- Rioal- IMFAF(