Robust statistical data modelling under potential model mis-specification often requires leaving the parametric world for the nonparametric. In the latter, parameters are infinite dimensional objects such as functions, probability distributions or infinite vectors. In the Bayesian nonparametric approach, prior distributions are designed for these parameters, which provide a handle to manage the complexity of nonparametric models in practice. However, most modern Bayesian nonparametric models seem often out of reach to practitioners, as inference algorithms need careful design to deal with the infinite number of parameters. The aim of this work is to facilitate the journey by providing computational tools for Bayesian nonparametric inference. The article describes a set of functions available in the \R package BNPdensity in order to carry out density estimation with an infinite mixture model, including all types of censored data. The package provides access to a large class of such models based on normalized random measures, which represent a generalization of the popular Dirichlet process mixture. One striking advantage of this generalization is that it offers much more robust priors on the number of clusters than the Dirichlet. Another crucial advantage is the complete flexibility in specifying the prior for the scale and location parameters of the clusters, because conjugacy is not required. Inference is performed using a theoretically grounded approximate sampling methodology known as the Ferguson & Klass algorithm. The package also offers several goodness of fit diagnostics such as QQ-plots, including a cross-validation criterion, the conditional predictive ordinate. The proposed methodology is illustrated on a classical ecological risk assessment method called the Species Sensitivity Distribution (SSD) problem, showcasing the benefits of the Bayesian nonparametric framework.
翻译:在潜在模型误差的模型下,强大的统计数据建模往往要求将参数世界留给非参数。在后者中,参数是无限的维度天体,例如函数、概率分布或无限矢量。在巴伊西亚非参数方法中,先前的分布是为这些参数设计的,为管理非参数模型的复杂性提供了一种手柄。然而,大多数现代巴伊西亚非参数模型似乎往往无法接触从业者,因为推断算法需要仔细设计,才能处理无限数量的参数。这项工作的目的是通过为Bayesian非偏差的诊断性推断提供计算工具来便利旅程。文章描述了在\R套件 BNPdensity中可用的一系列功能,以便用无限混合模型(包括所有类型的审查数据)来进行密度估计。该软件包提供大量基于正常随机测量的模型的接入,这代表了普惠性迪里特利特工艺混合物的总体化。这一普遍化框架的一个显著优势是,它为Bayeserveral 提供了更精准的预数组的计算工具,包括前数级的直径推法,因为Scial dealalalalisal roal roal roal 方法要求的精确评估。