Many analyses of multivariate data focus on evaluating the dependence between two sets of variables, rather than the dependence among individual variables within each set. Canonical correlation analysis (CCA) is a classical data analysis technique that estimates parameters describing the dependence between such sets. However, inference procedures based on traditional CCA rely on the assumption that all variables are jointly normally distributed. We present a semiparametric approach to CCA in which the multivariate margins of each variable set may be arbitrary, but the dependence between variable sets is described by a parametric model that provides low-dimensional summaries of dependence. While maximum likelihood estimation in the proposed model is intractable, we propose two estimation strategies: one using a pseudolikelihood for the model and one using a Markov chain Monte Carlo (MCMC) algorithm that provides Bayesian estimates and confidence regions for the between-set dependence parameters. The MCMC algorithm is derived from a multirank likelihood function, which uses only part of the information in the observed data in exchange for being free of assumptions about the multivariate margins. We apply the proposed Bayesian inference procedure to Brazilian climate data and monthly stock returns from the materials and communications market sectors.
翻译:暂无翻译