Modern datasets commonly feature both substantial missingness and variables of mixed data types, which present significant challenges for estimation and inference. Complete case analysis, which proceeds using only the observations with fully-observed variables, is often severely biased, while model-based imputation of missing values is limited by the ability of the model to capture complex dependencies and accommodate mixed data types. To address these challenges, we develop a novel Bayesian mixture copula for joint and nonparametric modelling of count, continuous, ordinal, and unordered categorical variables, and deploy this model for inference, prediction, and imputation of missing data. Most uniquely, we introduce a new and efficient strategy for marginal distribution estimation, which eliminates the need to specify any marginal models yet delivers strong posterior consistency for both the marginal distributions and the copula parameters even in the presence of informative missingness (i.e., missingness-at-random). Extensive simulation studies demonstrate exceptional modeling and imputation capabilities relative to competing methods, especially with mixed data types, complex missingness mechanisms, and nonlinear dependencies. We conclude with a data analysis that highlights how improper treatment of missing data can distort a statistical analysis, and how the proposed approach offers a resolution.
翻译:完整案例分析仅使用带有完全可见变量的观测进行,其结果往往严重偏差,而基于模型的缺失值估算则由于模型能够捕捉复杂的依赖性并容纳混合数据类型而受到限制。为了应对这些挑战,我们开发了一个新型的贝ysian混合混合组合,用于对计算、连续、交点和未定序的绝对变量进行联合和非对称建模,并采用这一模型来推断、预测和估算缺失数据。最独特的是,我们引入了一个新的高效的边际分布估计战略,该战略消除了任何边际分布估计的指定边际模型的需要,但即使存在信息缺失(即缺失)的情况下,边际分布和相交参数也具有很强的后继一致性。为了应对这些挑战,我们开发了一个新的贝ysian混合混合混合混合组合组合组合组合组合,用于联合和非对称性绝对变量的建模和估算能力。广泛的模拟研究表明,与竞争方法,特别是与混合数据类型、复杂的缺失机制和非线性依赖性数据相比,具有特殊的建模和估算能力。我们得出了一种不适当的数据分析方法。我们得出了一种不适当的数据分析结论,并提出了一种数据分析。