BNP密度:R的贝耶斯非参数混合物模型 (BNPdensity: Bayesian nonparametric mixture modeling in R)

Robust statistical data modelling under potential model mis-specification often requires leaving the parametric world for the nonparametric. In the latter, parameters are infinite dimensional objects such as functions, probability distributions or infinite vectors. In the Bayesian nonparametric approach, prior distributions are designed for these parameters, which provide a handle to manage the complexity of nonparametric models in practice. However, most modern Bayesian nonparametric models seem often out of reach to practitioners, as inference algorithms need careful design to deal with the infinite number of parameters. The aim of this work is to facilitate the journey by providing computational tools for Bayesian nonparametric inference. The article describes a set of functions available in the \R package BNPdensity in order to carry out density estimation with an infinite mixture model, including all types of censored data. The package provides access to a large class of such models based on normalized random measures, which represent a generalization of the popular Dirichlet process mixture. One striking advantage of this generalization is that it offers much more robust priors on the number of clusters than the Dirichlet. Another crucial advantage is the complete flexibility in specifying the prior for the scale and location parameters of the clusters, because conjugacy is not required. Inference is performed using a theoretically grounded approximate sampling methodology known as the Ferguson & Klass algorithm. The package also offers several goodness of fit diagnostics such as QQ-plots, including a cross-validation criterion, the conditional predictive ordinate. The proposed methodology is illustrated on a classical ecological risk assessment method called the Species Sensitivity Distribution (SSD) problem, showcasing the benefits of the Bayesian nonparametric framework.

翻译：在潜在模型误差的模型下,强大的统计数据建模往往要求将参数世界留给非参数。在后者中,参数是无限的维度天体,例如函数、概率分布或无限矢量。在巴伊西亚非参数方法中,先前的分布是为这些参数设计的,为管理非参数模型的复杂性提供了一种手柄。然而,大多数现代巴伊西亚非参数模型似乎往往无法接触从业者,因为推断算法需要仔细设计,才能处理无限数量的参数。这项工作的目的是通过为Bayesian非偏差的诊断性推断提供计算工具来便利旅程。文章描述了在\R套件 BNPdensity中可用的一系列功能,以便用无限混合模型(包括所有类型的审查数据)来进行密度估计。该软件包提供大量基于正常随机测量的模型的接入,这代表了普惠性迪里特利特工艺混合物的总体化。这一普遍化框架的一个显著优势是,它为Bayeserveral 提供了更精准的预数组的计算工具,包括前数级的直径推法,因为Scial dealalalalisal roal roal roal 方法要求的精确评估。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日