Gaussian process regression in its most simplified form assumes normal homoscedastic noise and utilizes analytically tractable mean and covariance functions of predictive posterior distribution using Gaussian conditioning. Its hyperparameters are estimated by maximizing the evidence, commonly known as type II maximum likelihood estimation. Unfortunately, Bayesian inference based on Gaussian likelihood is not robust to outliers, which are often present in the observational training data sets. To overcome this problem, we propose a robust process model in the Gaussian process framework with the likelihood of observed data expressed as the Huber probability distribution. The proposed model employs weights based on projection statistics to scale residuals and bound the influence of vertical outliers and bad leverage points on the latent functions estimates while exhibiting a high statistical efficiency at the Gaussian and thick tailed noise distributions. The proposed method is demonstrated by two real world problems and two numerical examples using datasets with additive errors following thick tailed distributions such as Students t, Laplace, and Cauchy distribution.
翻译:Gausian进程最简化的回归形式假定了正常的同族体噪音,并利用了使用高山调节器进行预测后部分布的可分析平均和共变功能。其超参数是通过尽量扩大证据(通常称为第二类最大可能性估计)来估计的。不幸的是,基于高山可能性的Baysesian推论对外部线并不可靠,而外部线往往存在于观察培训数据集中。为了解决这一问题,我们提议在高山进程框架中采用一个强有力的进程模型,其可能性以Huber概率分布表示。拟议模型使用基于预测统计数据的权重,以规模显示残余,并约束垂直外线和坏杠杆点对潜在函数估计的影响,同时在高山和厚尾的噪音分布中展示高统计效率。拟议方法以两个真实的世界问题和两个数字例子为证明,在学生 t、 Laplace 和 Cauchic 分布等厚尾部分布后使用带有添加错误的数据集。