与相关数据相关数据对比值回归分析回归分析 (Regression Analysis of Correlations for Correlated Data)

Correlated data are ubiquitous in today's data-driven society. A fundamental task in analyzing these data is to understand, characterize and utilize the correlations in them in order to conduct valid inference. Yet explicit regression analysis of correlations has been so far limited to longitudinal data, a special form of correlated data, while implicit analysis via mixed-effects models lacks generality as a full inferential tool. This paper proposes a novel regression approach for modelling the correlation structure, leveraging a new generalized z-transformation. This transformation maps correlation matrices that are constrained to be positive definite to vectors with un-restricted support, and is order-invariant. Building on these two properties, we develop a regression model to relate the transformed parameters to any covariates. We show that coupled with a mean and a variance regression model, the use of maximum likelihood leads to asymptotically normal parameter estimates, and crucially enables statistical inference for all the parameters. The performance of our framework is demonstrated in extensive simulation. More importantly, we illustrate the use of our model with the analysis of the classroom data, a highly unbalanced multilevel clustered data with within-class and within-school correlations, and the analysis of the malaria immune response data in Benin, a longitudinal data with time-dependent covariates in addition to time. Our analyses reveal new insights not previously known.

翻译：在当今数据驱动的社会中,与相关相关的数据无处不在。分析这些数据的基本任务是理解、定性和利用这些数据的关联性,以便进行有效的推断。然而,对相关性的明确回归分析迄今仅限于纵向数据,这是一种特殊的相关数据,而通过混合效应模型进行的隐含分析则缺乏普遍性,作为完全推论的工具,本文建议采用一种新的回归方法,用以模拟相关结构,利用新的普遍化的Z-转化。这种转型图相关矩阵在不限制支持的情况下对矢量具有确定性,并且是定型的。在这两个属性的基础上,我们开发了一个回归模型,将变化参数与任何共变体联系起来。我们表明,除了一种平均和差异回归模型外,使用最大可能性会导致无常的正常参数估计,并且关键地使统计推导出所有参数。我们框架的性能在广泛的模拟中得到了证明。更重要的是,我们用模型来分析课堂数据,而不是以高度不平衡的多层次的分辨率分析,在以往的气候层中,用我们已知的免疫度数据与长期的共变异性数据进行长期分析。