Several applications involving counts present a large proportion of zeros (excessof-zeros data). A popular model for such data is the Hurdle model, which explicitly models the probability of a zero count, while assuming a sampling distribution on the positive integers. We consider data from multiple count processes. In this context, it is of interest to study the patterns of counts and cluster the subjects accordingly. We introduce a novel Bayesian nonparametric approach to cluster multiple, possibly related, zero-inflated processes. We propose a joint model for zero-inflated counts, specifying a Hurdle model for each process with a shifted Negative Binomial sampling distribution. Conditionally on the model parameters, the different processes are assumed independent, leading to a substantial reduction in the number of parameters as compared to traditional multivariate approaches. The subject-specific probabilities of zero-inflation and the parameters of the sampling distribution are flexibly modelled via an enriched finite mixture with random number of components. This induces a two-level clustering of the subjects based on the zero/non-zero patterns (outer clustering) and on the sampling distribution (inner clustering). Posterior inference is performed through tailored MCMC schemes. We demonstrate the proposed approach on an application involving the use of the messaging service WhatsApp.
翻译:一些涉及计数的应用程序显示的是很大一部分零(超过零度数据)。这些数据的一个流行模型是 " Hurdle " 模型,该模型在假设正数整数的抽样分布时,明确模拟零点数的概率,同时假设对正数整数进行抽样分布;我们考虑多个计数过程的数据;在这方面,我们有兴趣研究计数模式,并据此对主题进行分组;我们为多个、可能相关、零膨胀过程采用新的巴伊西亚非参数性方法;我们提出一个零膨胀计数联合模型,为每个过程指定一个无膨胀模型,为每个过程指定一个交错模式,并改变负比亚抽样分布;在模型参数上,不同过程假设独立,导致参数数量与传统的多变法方法相比大幅减少。零通货膨胀的具体概率和抽样分布参数的参数分布参数通过随机数的浓缩的有限混合物灵活模拟。这促使根据零/非零度模式(外计数组合)和抽样分布法对主题进行两级组合。在模型参数参数分布上,不同过程假定独立,导致参数数目与传统的多变的参数数目组合办法相比,导致参数数目的参数数目;我们通过应用了电算。