The model prediction of the Gaussian process (GP) regression can be significantly biased when the data are contaminated by outliers. We propose a new robust GP regression algorithm that iteratively trims a portion of the data points with the largest deviation from the predicted mean. While the new algorithm retains the attractive properties of the standard GP as a nonparametric and flexible regression method, it can significantly reduce the influence of outliers even in some extreme cases. It is also easier to implement than previous robust GP variants that rely on approximate inference. Applied to various synthetic datasets with contaminations, the proposed method outperforms the standard GP and the popular robust GP variant with the Student's t likelihood, especially when the outlier fraction is high. Lastly, as a practical example in the astrophysical study, we show that this method can determine the main-sequence ridge line precisely in the color-magnitude diagram of star clusters.
翻译:当数据受到外部线污染时,对高斯进程回归的模型预测可能会有很大偏差。 我们提议一种新的稳健的GP回归算法, 迭代地调整部分数据点, 与预测平均值相差最大。 虽然新的算法保留了标准的GP的吸引力属性, 作为一种非对数和灵活的回归方法, 但它可以大大减少外部线的影响, 即使在一些极端情况下也是如此。 实施比以往依靠近似推理的稳健健的GP变方要容易得多。 与污染的各种合成数据集相比, 拟议的方法比标准GP和流行的强健健的GP变方高出学生的可能性, 特别是当外部分数高时。 最后, 作为天体物理学研究的一个实际例子, 我们显示, 这种方法可以精确地确定恒星群的色放大图中的主序列脊线 。