Bayesian optimization (BO) primarily uses Gaussian processes (GP) as the key surrogate model, mostly with a simple stationary and separable kernel function such as the widely used squared-exponential kernel with automatic relevance determination (SE-ARD). However, such simple kernel specifications are deficient in learning functions with complex features, such as being nonstationary, nonseparable, and multimodal. Approximating such functions using a local GP, even in a low-dimensional space, will require a large number of samples, not to mention in a high-dimensional setting. In this paper, we propose to use Bayesian Kernelized Tensor Factorization (BKTF) -- as a new surrogate model -- for BO in a D-dimensional Cartesian product space. Our key idea is to approximate the underlying D-dimensional solid with a fully Bayesian low-rank tensor CP decomposition, in which we place GP priors on the latent basis functions for each dimension to encode local consistency and smoothness. With this formulation, information from each sample can be shared not only with neighbors but also across dimensions. Although BKTF no longer has an analytical posterior, we can still efficiently approximate the posterior distribution through Markov chain Monte Carlo (MCMC) and obtain prediction and full uncertainty quantification (UQ). We conduct numerical experiments on both standard BO testing problems and machine learning hyperparameter tuning problems, and our results confirm the superiority of BKTF in terms of sample efficiency.
翻译:贝叶斯优化( BO) 主要使用高斯进程( GP ) 作为关键的替代模型, 大多使用简单的固定和可分离的内核功能, 如广泛使用的平方爆炸内核( SE-ARD ) 。 然而, 这种简单的内核规格在学习功能上存在缺陷, 具有复杂的特征, 如非静止、不可分离和多式。 使用本地GP( GP) 等功能, 即使在低维空间, 也需要大量的样本, 更不用说在高维环境中。 在本文中, 我们提议在D- 维度 Cartes 产品空间中, 使用广泛使用的平方爆炸内核内核质( BKTF ) 。 然而, 我们的主要想法是将基础的D- 固固值与整个巴伊斯低度的 10- CP 分解配置相近。 我们将GP 之前的功能放在潜值上, 以调解说本地一致性和平滑性。 在本文中, 我们的公式, 每一个样本的 和 IMF 的 都无法在 B 上 的 的 分析 。</s>