A novel Bayesian approach to the problem of variable selection using Gaussian process regression is proposed. The selection of the most relevant variables for a problem at hand often results in an increased interpretability and in many cases is an essential step in terms of model regularization. In detail, the proposed method relies on so-called nearest neighbor Gaussian processes, that can be considered as highly scalable approximations of classical Gaussian processes. To perform a variable selection the mean and the covariance function of the process are conditioned on a random set $\mathcal{A}$. This set holds the indices of variables that contribute to the model. While the specification of a priori beliefs regarding $\mathcal{A}$ allows to control the number of selected variables, so-called reference priors are assigned to the remaining model parameters. The application of the reference priors ensures that the process covariance matrix is (numerically) robust. For the model inference a Metropolis within Gibbs algorithm is proposed. Based on simulated data, an approximation problem from computer experiments and two real-world datasets, the performance of the new approach is evaluated.
翻译:对使用 Gausian 进程回归法的变量选择问题提出了一种新颖的巴伊西亚方法。 选择手头问题最相关的变量往往导致解释性增加, 在许多情况下, 这是模式正规化的一个必要步骤。 详细来说, 拟议方法依赖于所谓的近邻高萨进程, 这可以被视为古典高山进程高度可缩放的近似近似值。 要执行变量选择, 过程的平均值和共变量功能取决于随机设定 $\ mathcal{A}$ 。 此集保存着有助于模型的变量索引。 虽然对 $\ mathcal{A} $ 的先验信念的规格允许控制选定变量的数量, 但所谓的先前引用被指定给其余的模型参数。 引用前的应用程序可以确保进程共变矩阵( 数字) 稳健。 对于在 Gibbs 算法中推导出Metopolis 的模型, 以模拟数据为基础, 计算机实验和两个真实世界数据集的近似问题, 对新方法的性进行了评估。