Gaussian processes (GPs) are ubiquitously used in sciences and engineering as metamodels. Standard GPs, however, can only handle numerical or quantitative variables. In this paper, we introduce latent map Gaussian processes (LMGPs) that inherit the attractive properties of GPs and are also applicable to mixed data which have both quantitative and qualitative inputs. The core idea behind LMGPs is to learn a continuous, low-dimensional latent space or manifold which encodes all qualitative inputs. To learn this manifold, we first assign a unique prior vector representation to each combination of qualitative inputs. We then use a low-rank linear map to project these priors on a manifold that characterizes the posterior representations. As the posteriors are quantitative, they can be directly used in any standard correlation function such as the Gaussian or Matern. Hence, the optimal map and the corresponding manifold, along with other hyperparameters of the correlation function, can be systematically learned via maximum likelihood estimation. Through a wide range of analytic and real-world examples, we demonstrate the advantages of LMGPs over state-of-the-art methods in terms of accuracy and versatility. In particular, we show that LMGPs can handle variable-length inputs, have an explainable neural network interpretation, and provide insights into how qualitative inputs affect the response or interact with each other. We also employ LMGPs in Bayesian optimization and illustrate that they can discover optimal compound compositions more efficiently than conventional methods that convert compositions to qualitative variables via manual featurization.
翻译:高斯进程( GPs) 被广泛用于科学和工程, 作为元模。 标准 GPs 只能处理数字或数量变量。 在本文中, 我们引入了继承 GPs 具有吸引力的特性的潜伏 Gaussian 进程( LMGP ), 并适用于具有定量和定性投入的混合数据。 LMGP 的核心理念是学习一个连续的、低维潜伏的空间或将所有质量投入编码起来的元件。 要学习这个元件, 我们首先为每种质量投入组合指定一个独特的先前矢量代表。 我们然后使用一个低层次的线性分布图, 将这些前端组成投放到一个具有后方特征的公式上。 由于后方是量化的, 它们可以直接用于任何标准的关联性功能, 如高点或玛德尔等。 因此, 最佳的地图和对应的元件, 以及其它超度参数, 可以通过最大的可能性估计, 系统学习。 通过一系列的解析和现实的示例, 我们展示了LMGPs 的高级变量的精度, 和可解释方法, 以比我们更精确的直径的精度 。