In biomedical and public health association studies, binary outcome variables may be subject to misclassification, resulting in substantial bias in effect estimates. The feasibility of addressing binary outcome misclassification in regression models is often hindered by model identifiability issues. In this paper, we characterize the identifiability problems in this class of models as a specific case of "label switching" and leverage a pattern in the resulting parameter estimates to solve the permutation invariance of the complete data log-likelihood. Our proposed algorithm in binary outcome misclassification models does not require gold standard labels and relies only on the assumption that outcomes are correctly classified at least 50% of the time. A label switching correction is applied within estimation methods to recover unbiased effect estimates and to estimate misclassification rates in cases with one or more sequential observed outcomes. Open source software is provided to implement the proposed methods for single- and two-stage models. We give a detailed simulation study for our proposed methodology and apply these methods to data for single-stage modeling of the Medical Expenditure Panel Survey (MEPS) from 2020 and two-stage modeling of data from the Virginia Department of Criminal Justice Services.
翻译:在生物医学和公共卫生的关联研究中,二元结果变量可能会受到误分类的影响,从而导致效应估计存在重大偏差。处理回归模型中的二元结果误分类问题的可行性通常受模型可识别性问题的限制。在本文中,我们将这类模型中的可识别性问题描述为“标签混淆”的一种特定情况,并利用得到的参数估计模式来解决完整数据对数似然的排列不变性。我们提出的算法在二元结果误分类模型中不需要黄金标准标签,仅依赖于数据分类正确的假设,用于估计单个或多个顺序观测结果的误分类率和恢复无偏效应估计。我们为单阶段和双阶段模型提供了开源软件的实现。通过详细的仿真研究验证了我们提出的方法,并应用这些方法于2020年医疗支出面板调查(MEPS)的单阶段建模和来自弗吉尼亚州刑事司法服务局的双阶段建模数据。