Under-reporting of count data poses a major roadblock for prediction and inference. In this paper, we focus on the Pogit model, which deconvolves the generating Poisson process from the censuring process controlling under-reporting using a generalized linear modeling framework. We highlight the limitations of the Pogit model and address them by adding constraints to the estimation framework. We also develop uncertainty quantification techniques that are robust to model mis-specification. Our approach is evaluated using synthetic data and applied to real healthcare datasets, where we treat in-patient data as `reported' counts and use held-out total injuries to validate the results. The methods make it possible to separate the Poisson process from the under-reporting process, given sufficient expert information. Codes to implement the approach are available via an open source Python package.
翻译:统计数据少报是预测和推论的一大障碍。在本文中,我们侧重于Pogit模型,该模型利用一个普遍的线性模型框架,将产生Poisson过程从控制报告不足的感应过程分离出来,我们强调Pogit模型的局限性,并通过增加估计框架的限制来解决这些问题。我们还开发了可靠的不确定性量化技术,以模拟错误的区分。我们的方法是使用合成数据进行评估,并应用到真正的保健数据集中,我们把住院数据作为“报告”计数,并使用被搁置的完全损伤来验证结果。这些方法使得将Poisson过程与报告不足过程分开,只要有足够的专家信息。通过开放源Python软件包提供实施方法的代码。