In the last decade, the secondary use of large data from health systems, such as electronic health records, has demonstrated great promise in advancing biomedical discoveries and improving clinical decision making. However, there is an increasing concern about biases in association studies caused by misclassification in the binary outcomes derived from electronic health records. We revisit the classical logistic regression model with misclassified outcomes. Despite that local identification conditions in some related settings have been previously established, the global identification of such models remains largely unknown and is an important question yet to be answered. We derive necessary and sufficient conditions for global identifiability of logistic regression models with misclassified outcomes, using a novel approach termed as the submodel analysis, and a technique adapted from the Picard-Lindel\"{o}f existence theorem in ordinary differential equations. In particular, our results are applicable to logistic models with discrete covariates, which is a common situation in biomedical studies, The conditions are easy to verify in practice. In addition to model identifiability, we propose a hypothesis testing procedure for regression coefficients in the misclassified logistic regression model when the model is not identifiable under the null.
翻译:在过去十年中,从卫生系统中获取的大量数据,例如电子健康记录,在推动生物医学发现和改善临床决策方面,第二次使用电子健康记录等大量数据,显示了巨大的前景;然而,人们日益关注电子健康记录产生的二进制结果的分类错误导致关联研究中的偏差。我们重新审视典型的后勤回归模式,其结果分类错误。尽管某些相关环境的本地识别条件以前已经确立,但全球对这种模型的识别仍基本未知,这是一个有待回答的重要问题。我们利用称为子模型分析的新颖方法,以及根据普通差异方程式的Picard-Lindel\{o}f 所应用的技术,为物流回归模型的全球可识别性提供了必要和充分的条件。特别是,我们的结果适用于使用离散的共变式的后勤模式,这是生物医学研究的一种常见情况,但条件在实际中很容易核实。除了模型可识别性之外,我们提议在模型无法识别无效的情况下,在错误分类的物流回归模型中采用回归系数的假设测试程序。