Motivated by the problem of accurately predicting gap times between successive blood donations, we present here a general class of Bayesian nonparametric models for clustering. These models allow for prediction of new recurrences, accommodating covariate information that describes the personal characteristics of the sample individuals. We introduce a prior for the random partition of the sample individuals which encourages two individuals to be co-clustered if they have similar covariate values. Our prior generalizes PPMx models in the literature, which are defined in terms of cohesion and similarity functions. We assume cohesion functions which yield mixtures of PPMx models, while our similarity functions represent the compactness of a cluster. We show that including covariate information in the prior specification improves the posterior predictive performance and helps interpret the estimated clusters, in terms of covariates in the blood donation application.
翻译:出于准确预测相继献血之间间隔时间的问题,我们在此提出一个通用的巴伊西亚非参数模型,用于分组。这些模型可以预测新的复发,包含描述抽样个人个人个人特征的共变信息。我们引入了抽样个人随机分割的前奏,鼓励两个个人在具有相似的共变值时将两者混合在一起。我们先前在文献中概括了PPMx模型,这些模型的定义是凝聚和相似性功能。我们承担了生成PPMx模型混合物的凝聚功能,而我们的相似性功能则代表了集的紧凑性。我们表明,在先前的规格中包括共变式信息,可以改善后验预测性,并有助于解释估计的组群,即血液捐赠应用中的共变性。