The Gaussian process (GP) regression can be severely biased when the data are contaminated by outliers. This paper presents a new robust GP regression algorithm that iteratively trims the most extreme data points. While the new algorithm retains the attractive properties of the standard GP as a nonparametric and flexible regression method, it can greatly improve the model accuracy for contaminated data even in the presence of extreme or abundant outliers. It is also easier to implement compared with previous robust GP variants that rely on approximate inference. Applied to a wide range of experiments with different contamination levels, the proposed method significantly outperforms the standard GP and the popular robust GP variant with the Student-t likelihood in most test cases. In addition, as a practical example in the astrophysical study, we show that this method can precisely determine the main-sequence ridge line in the color-magnitude diagram of star clusters.
翻译:当数据被外部线污染时,高斯进程回归可能会严重偏差。 本文展示了一种新的稳健的GP回归算法, 迭接地修剪最极端的数据点。 虽然新的算法保留了标准的GP具有吸引力的特性, 作为一种非参数和灵活的回归法, 但它可以大大提高被污染数据的模型的准确性, 即使存在极端或充足的外部线。 与以前依靠近似推理的稳健的GP变方相比, 实施起来也更容易。 与不同污染水平的广泛实验相比, 拟议的方法大大超过标准GP和流行型强GP变方, 在大多数测试案例中, 与学生的概率相比。 此外, 作为天体物理研究的一个实际实例, 我们显示, 这种方法可以精确地确定恒星群色磁图中的主序列脊线 。