When the data are sparse, optimization of hyperparameters of the kernel in Gaussian process regression by the commonly used maximum likelihood estimation (MLE) criterion often leads to overfitting. We show that choosing hyperparameters (in this case, kernel length parameter and regularization parameter) based on a criterion of the completeness of the basis in the corresponding linear regression problem is superior to MLE. We show that this is facilitated by the use of high-dimensional model representation (HDMR) whereby a low-order HDMR representation can provide reliable reference functions and large synthetic test data sets needed for basis parameter optimization even when the original data are few.
翻译:当数据稀少时,按照常用的最大概率估计标准优化高斯进程回归中内核的超参数往往会导致过度调整。我们表明,根据相应线性回归问题基础的完整性标准选择超参数(此处为内核长度参数和正规化参数),优于多边环境。我们表明,使用高维模型表示法(HDMR)有助于做到这一点,即低级高低频MDMR表示法可以提供可靠的参考功能和大型合成测试数据集,即使原始数据很少,也为基础参数优化所需要。