We propose a novel statistical inference methodology for multiway count data that is corrupted by false zeros that are indistinguishable from true zero counts. Our approach consists of zero-truncating the Poisson distribution to neglect all zero values. This simple truncated approach dispenses with the need to distinguish between true and false zero counts and reduces the amount of data to be processed. Inference is accomplished via tensor completion that imposes low-rank tensor structure on the Poisson parameter space. Our main result shows that an $N$-way rank-$R$ parametric tensor $\boldsymbol{\mathscr{M}}\in(0,\infty)^{I\times \cdots\times I}$ generating Poisson observations can be accurately estimated by zero-truncated Poisson regression from approximately $IR^2\log_2^2(I)$ non-zero counts under the nonnegative canonical polyadic decomposition. Our result also quantifies the error made by zero-truncating the Poisson distribution when the parameter is uniformly bounded from below. Therefore, under a low-rank multiparameter model, we propose an implementable approach guaranteed to achieve accurate regression in under-determined scenarios with substantial corruption by false zeros. Several numerical experiments are presented to explore the theoretical results.
翻译:我们提出了一种新的多维计数数据的统计推断方法,该数据被虚假的零计数干扰,这些计数是与真实的零计数无法区分的。我们的方法是对泊松分布进行零截断,舍去所有零值。这种简单的截断方法省去了区分真实零计数和虚假零计数的必要性,并减少了要处理的数据量。通过在泊松参数空间上施加低秩张量结构,通过张量完成,来实现推理。我们的主要结果表明,使用非负正交分解,从约为$IR^2\log_2^2(I)$个非零计数中,可以准确地估计生成泊松观测值的$N$阶秩为R的参数张量$\boldsymbol{\mathscr{M}}\in(0,\infty)^{I\times \cdots\times I}$。当参数从下界一致有界时,我们的结果还量化了零截断泊松分布的误差。因此,在低秩多参数模型下,我们提出了一种可以在受虚假零值干扰的欠定情况下实现准确回归的可实施方法。我们还展示了几个数值实验来探索理论结果。