Many clinical and epidemiological studies encode collected participant-level information via a collection of continuous, truncated, ordinal, and binary variables. To gain novel insights in understanding complex interactions between collected variables, there is a critical need for the development of flexible frameworks for joint modeling of mixed data types variables. We propose Semiparametric Gaussian Copula Regression modeling (SGCRM) that allows to model a joint dependence structure between observed continuous, truncated, ordinal, and binary variables and to construct conditional models with these four data types as outcomes with a guarantee that derived conditional models are mutually consistent. Semiparametric Gaussian Copula (SGC) mechanism assumes that observed SGC variables are generated by - i) monotonically transforming marginals of latent multivariate normal random variable and ii) dichotimizing/truncating these transformed marginals. SGCRM estimates the correlation matrix of the latent normal variables through an inversion of "bridges" between Kendall's Tau rank correlations of observed mixed data type variables and latent Gaussian correlations. We derive a novel bridging result to deal with a general ordinal variable. In addition to the previously established asymptotic consistency, we establish asymptotic normality of the latent correlation estimators. We also establish the asymptotic normality of SGCRM regression estimators and provide a computationally efficient way to calculate asymptotic covariances. We propose computationally efficient methods to predict SGC latent variables and to do imputation of missing data. Using National Health and Nutrition Examination Survey (NHANES), we illustrate SGCRM and compare it with the traditional conditional regression models including truncated Gaussian regression, ordinal probit, and probit models.
翻译:许多临床和流行病学研究都通过收集连续、截断、交错和二进制的变量,将收集的参与者一级的信息编码起来。为了在理解所收集的变量之间的复杂相互作用方面获得新的洞察力,非常需要为混合数据类型变量的联合建模建立灵活的框架。我们提议建立Semparatic Gausian Copula Returation模型(SGCRM),以便能够在观察到的连续、截断、交错、交错变量和二进制变量之间建模一个联合依赖性结构,并用这四种数据类型建立有条件的模型,作为结果保证衍生的有条件模型相互一致。 半分数的Gaussian Copula(SGC)机制假设观察到的SGC变量是-i)单调化的多变异异性模型边际模型生成的。 SGCRMRMM(SGC)估计了隐性正常变量的关联性矩阵,通过Kendallicaltial 和Srationalalality 方法在所观测到的混合数据类型变量和潜化的相互对等关系上,我们建立了一种新连接的正态数据。