Several applications involving counts present a large proportion of zeros (excess-of-zeros data). A popular model for such data is the Hurdle model, which explicitly models the probability of a zero count, while assuming a sampling distribution on the positive integers. We consider data from multiple count processes. In this context, it is of interest to study the patterns of counts and cluster the subjects accordingly. We introduce a novel Bayesian nonparametric approach to cluster multiple, possibly related, zero-inflated processes. We propose a joint model for zero-inflated counts, specifying a Hurdle model for each process with a shifted Negative Binomial sampling distribution. Conditionally on the model parameters, the different processes are assumed independent, leading to a substantial reduction in the number of parameters as compared to traditional multivariate approaches. The subject-specific probabilities of zero-inflation and the parameters of the sampling distribution are flexibly modelled via an enriched finite mixture with random number of components. This induces a two-level clustering of the subjects based on the zero/non-zero patterns (outer clustering) and on the sampling distribution (inner clustering). Posterior inference is performed through tailored MCMC schemes. We demonstrate the proposed approach on an application involving the use of the messaging service WhatsApp.
翻译:一些涉及计数的应用程序显示的是很大一部分零(零度数据过大)。这些数据的一个流行模型是Hurdle模型,该模型在假设正数整数的抽样分布时,明确模拟零点数的概率,同时假设对正数整数进行抽样分布;我们考虑多个计数过程的数据;在这方面,我们有兴趣研究计数模式,并相应地对主题分组进行分类;我们对多个、可能相关、零膨胀过程采用新的巴伊西亚非参数性方法;我们提出一个零膨胀计数的联合模型,为每个过程指定一个零点数模型,为每个过程指定一个带转移负数抽样分布的摇篮模型;在模型参数上,假设不同的过程是独立的,导致参数数量与传统的多变法方法相比大幅减少。零通货膨胀的具体主题概率和抽样分布参数的参数分布参数是灵活的,我们采用随机数的浓缩的有限混合混合物进行模拟。这促使根据零/非零度模式(离子组合)和抽样分布法对主题进行两个层次的组合。在模型参数参数上,不同过程假定不同的参数数量。我们通过移动的投影视服务办法。