使用综合嵌套拉普拉斯近似的联合贝叶斯框架处理缺失数据和测量误差 (A joint Bayesian framework for missing data and measurement error using integrated nested Laplace approximations)

Measurement error (ME) and missing values in covariates are often unavoidable in disciplines that deal with data, and both problems have separately received considerable attention during the past decades. However, while most researchers are familiar with methods for treating missing data, accounting for ME in covariates of regression models is less common. In addition, ME and missing data are typically treated as two separate problems, despite practical and theoretical similarities. Here, we exploit the fact that missing data in a continuous covariate is an extreme case of classical ME, allowing us to use existing methodology that accounts for ME via a Bayesian framework that employs integrated nested Laplace approximations (INLA), and thus to simultaneously account for both ME and missing data in the same covariate. As a useful by-product, we present an approach to handle missing data in INLA, since this corresponds to the special case when no ME is present. In addition, we show how to account for Berkson ME in the same framework. In its broadest generality, the proposed joint Bayesian framework can thus account for Berkson ME, classical ME, and missing data, or for any combination of these in the same or different continuous covariates of the family of regression models that are feasible with INLA. The approach is exemplified using both simulated and real data. We provide extensive and fully reproducible Supplementary Material with thoroughly documented examples using {R-INLA} and {inlabru}.

翻译：测量误差和协变量缺失通常是不可避免的问题。在过去的几十年中，这两个问题都受到了大量关注。然而，虽然大多数研究人员熟悉处理缺失数据的方法，但考虑回归模型中协变量的测量误差却不常见。此外，尽管两个问题在实践和理论上有相似之处，但通常将测量误差和缺失数据视为两个单独的问题。在本文中，作者利用缺失数据在连续协变量中的作用类似于传统测量误差的特性，提出了一个新的联合贝叶斯框架，可以同时处理测量误差和缺失数据。该方法使用综合嵌套拉普拉斯近似（Integrated Nested Laplace Approximations，INLA），并提供了平稳状态蒙特卡罗仿真的完全可重复的附加材料以供参考。此外，作者还提出了一种处理INLA中的缺失数据的方法。综上所述，该文章提出的联合贝叶斯框架可以处理Berkson误差、传统误差和缺失数据，或者这些误差的任何组合，具有较高的应用价值。最后，本研究通过模拟和实际数据进行了实验证明了此方法的可行性。