We consider the analysis of count data in which the observed frequency of zero counts is unusually large, typically with respect to the Poisson distribution. We focus on two alternative modelling approaches: Over-Dispersion (OD) models, and Zero-Inflation (ZI) models, both of which can be seen as generalisations of the Poisson distribution; we refer to these as Implicit and Explicit ZI models, respectively. Although sometimes seen as competing approaches, they can be complementary; OD is a consequence of ZI modelling, and ZI is a by-product of OD modelling. The central objective in such analyses is often concerned with inference on the effect of covariates on the mean, in light of the apparent excess of zeros in the counts. Typically the modelling of the excess zeros per se is a secondary objective and there are choices to be made between, and within, the OD and ZI approaches. The contribution of this paper is primarily conceptual. We contrast, descriptively, the impact on zeros of the two approaches. We further offer a novel descriptive characterisation of alternative ZI models, including the classic hurdle and mixture models, by providing a unifying theoretical framework for their comparison. This in turn leads to a novel and technically simpler ZI model. We develop the underlying theory for univariate counts and touch on its implication for multivariate count data.
翻译:我们考虑对计数数据的分析,观察到零计数的频率异乎寻常地大,典型的是Poisson分布,我们侧重于两种替代建模方法:超分散模型和零通货膨胀模型,这两种模型都可被视为Poisson分布的概括性;我们将这些模型分别称为隐含和模糊的ZI模型,虽然有时被视为相互竞争的方法,但它们可以相互补充;OD是ZI建模的结果,而ZI是OD建模的副产品。这种分析的中心目标往往涉及从数值上推论共变对平均值的影响:超分散模型和零通货膨胀模型;典型地说,单项零的建模是一个次要目标,在OD和ZI方法之间可以作出选择。尽管有时被视为相互竞争的方法,但OD可以相互补充;OD是ZI建模的结果,而ZI是OD建模的副产品。我们对这种分析的中心目标往往涉及从数值角度推断出对平均值的影响,考虑到在数值上明显超过零的数值;一般说来,将ZI的建模模型的模型变成一个基础性模型。