为优化贝叶斯优化而预先培训的高斯进程</s> (Pre-trained Gaussian processes for Bayesian optimization)

Bayesian optimization (BO) has become a popular strategy for global optimization of expensive real-world functions. Contrary to a common expectation that BO is suited to optimizing black-box functions, it actually requires domain knowledge about those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process (GP) priors that specify initial beliefs on functions. However, even with expert knowledge, it is non-trivial to quantitatively define a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. In this work, we detail what pre-training entails for GPs using a KL divergence based loss function, and propose a new pre-training based BO framework named HyperBO. Theoretically, we show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known. To verify our approach in realistic model training setups, we collect a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art deep learning models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, HyperBO is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods on both our new tuning dataset and classic multi-task BO benchmarks.

翻译：Bayesian 优化( BO) 已经成为全球优化昂贵现实世界功能的流行战略。与人们共同期望BO适合优化黑盒功能相反, 它实际上需要有关这些功能的域知识才能成功部署BO。这种域知识通常表现在指定功能初始信念的Gausian进程(GP)前期。但是,即使有专家知识, 也非三进制到数量上定义。对于复杂的机器学习模型的超参数调整问题来说尤其如此, 在那里, 调试目标的景观往往难以理解。我们寻求一种替代做法来设置这些功能前科。特别是, 我们考虑的情况是, 我们拥有类似功能的域知识, 从而可以事先更严格地部署这些功能。在这项工作中, 我们详细说明了对GP的预培训需要如何使用基于损失差异的功能, 并提出了一个新的基于BOB框架的预培训前。从理论上看, 我们展示了对超组织最佳的上下层和近于我们“ GGPO ” 之前“ 地面数据” 的精确预测和近为我们所了解的多层次数据排序方法。。在高层次的模型中, 我们的模型中, 我们的多层次的模型中的数据模型中, 将显示我们最接近于对高层次的模型中的数据的模型的模型的模型的模型的模型的模型的模型的模型的模型, 显示了我们最接近于对高层次的模型的模型中的数据。</s>