Gibbs-type priors are widely used as key components in several Bayesian nonparametric models. By virtue of their flexibility and mathematical tractability, they turn out to be predominant priors in species sampling problems, clustering and mixture modelling. We introduce a new family of processes which extend the Gibbs-type one, by including a contaminant component in the model to account for the presence of anomalies (outliers) or an excess of observations with frequency one. We first investigate the induced random partition, the associated predictive distribution and we characterize the asymptotic behaviour of the number of clusters. All the results we obtain are in closed form and easily interpretable, as a noteworthy example we focus on the contaminated version of the Pitman-Yor process. Finally we pinpoint the advantage of our construction in different applied problems: we show how the contaminant component helps to perform outlier detection for an astronomical clustering problem and to improve predictive inference in a species-related dataset, exhibiting a high number of species with frequency one.
翻译:Gibs类型前置物被广泛用作巴伊西亚若干非参数模型中的关键组成部分。 由于其灵活性和数学可移动性, 它们在物种取样问题、 集群和混合建模方面成为主要的先导。 我们引入了一个新的过程, 将Gibbs型的流程扩大为一类, 将污染物元件纳入模型, 以说明是否存在异常( 异常) 或超常观测频率之一。 我们首先调查诱发的随机分区、 相关的预测分布, 并描述组群数的无规律行为。 我们获得的所有结果都是封闭形式和容易解释的, 值得注意的例子就是我们集中关注受污染的Pitman- Yor 进程版本。 最后, 我们指出我们构建过程在不同应用问题的优势: 我们展示了污染物元件是如何帮助对天文群集问题进行外部检测, 并改进与物种有关的数据集的预测性推导力, 展示了高频一的物种数量 。