Many analyses of multivariate data are focused on evaluating the dependence between two sets of variables, rather than the dependence among individual variables within each set. Canonical correlation analysis (CCA) is a classical data analysis technique that estimates parameters describing the dependence between such sets. However, inference procedures based on traditional CCA rely on the assumption that all variables are jointly normally distributed. We present a semiparametric approach to CCA in which the multivariate margins of each variable set may be arbitrary, but the dependence between variable sets is described by a parametric model that provides a low-dimensional summary of dependence. While maximum likelihood estimation in the proposed model is intractable, we develop a novel MCMC algorithm called cyclically monotone Monte Carlo (CMMC) that provides estimates and confidence regions for the between-set dependence parameters. This algorithm is based on a multirank likelihood function, which uses only part of the information in the observed data in exchange for being free of assumptions about the multivariate margins. We illustrate the proposed inference procedure on nutrient data from the USDA.
翻译:对多变量数据的许多分析侧重于评价两组变量之间的依赖性,而不是各组不同变量之间的依赖性。Canonical关联性分析(CCA)是一种古典数据分析技术,用来估计这些数据集之间的依赖性。然而,基于传统CCA的推论程序所依据的假设是,所有变量都通常共同分布。我们对共同国家评估提出了一个半参数方法,其中每个变量组的多变量边际可能是任意的,但各变量组之间的依赖性则由一个提供低维度依赖性摘要的参数模型来描述。虽然拟议模型中的最大可能性估算是难以控制的,但我们开发了一种称为周期性单调蒙特卡洛(CMMMCC)的新型MCMC算法,该算法为两套依赖性参数提供估计和信任区域。这一算法的基础是一种多层次的可能性功能,它仅使用观察数据中的信息的一部分来交换关于多变量边际边际的假设。我们举例说明了拟议对美国农业部营养素数据的推断程序。