Many analyses of multivariate data are focused on evaluating the dependence between two sets of variables, rather than the dependence among individual variables within each set. Canonical correlation analysis (CCA) is a classical data analysis technique that estimates parameters describing the dependence between such sets. However, inference procedures based on traditional CCA rely on the assumption that all variables are jointly normally distributed. We present a semiparametric approach to CCA in which the multivariate margins of each variable set may be arbitrary, but the dependence between variable sets is described by a parametric model that provides low-dimensional summaries of dependence. While maximum likelihood estimation in the proposed model is intractable, we develop a novel MCMC algorithm called cyclically monotone Monte Carlo (CMMC) that provides estimates and confidence regions for the between-set dependence parameters. This algorithm is based on a multirank likelihood function, which uses only part of the information in the observed data in exchange for being free of assumptions about the multivariate margins. We apply the proposed inference procedure to Brazilian climate data and monthly stock returns from the materials and communications market sectors.
翻译:对多变量数据的许多分析侧重于评价两组变量之间的依赖性,而不是各组不同变量之间的依赖性。Canonical关联性分析(CCA)是一种古典数据分析技术,用来估计这些数据集之间的依赖性;然而,基于传统CCA的推论程序所依据的假设是,所有变量通常都是共同分布的。我们对共同国家评估提出了一个半参数方法,其中每个变量组的多变量边际可能是任意的,但各变量组之间的依赖性则由一个提供低维度依赖性摘要的参数模型来描述。虽然拟议模型中的最大可能性估算是难以控制的,但我们开发了一个叫作周期性单调蒙特卡洛(CMMMMC)的新型MCMC算法,它为两套依赖性参数提供估计和信任区域。这一算法基于一种多层次的可能性功能,即仅使用观察数据中的一部分信息交换,不假定多变量边际的边际。我们对巴西的气候数据以及材料和通信市场部门的月存量回报适用提议的推断程序。