Correlation matrices are omnipresent in multivariate data analysis. When the number d of variables is large, the sample estimates of correlation matrices are typically noisy and conceal underlying dependence patterns. We consider the case when the variables can be grouped into K clusters with exchangeable dependence; this assumption is often made in applications, e.g., in finance and econometrics. Under this partial exchangeability condition, the corresponding correlation matrix has a block structure and the number of unknown parameters is reduced from d(d-1)/2 to at most K(K+1)/2. We propose a robust algorithm based on Kendall's rank correlation to identify the clusters without assuming the knowledge of K a priori or anything about the margins except continuity. The corresponding block-structured estimator performs considerably better than the sample Kendall rank correlation matrix when K < d. The new estimator can also be much more efficient in finite samples even in the unstructured case K = d, although there is no gain asymptotically. When the distribution of the data is elliptical, the results extend to linear correlation matrices and their inverses. The procedure is illustrated on financial stock returns.
翻译:在多变量数据分析中,关联矩阵是无处不在的。当变量数量大时,相关矩阵的抽样估计通常杂杂杂,隐藏基本依赖模式。我们考虑变量可分为可互换依赖的K类组的情况;这种假设往往是在应用中作出的,例如财务和计量经济学中的。在这种部分互换条件下,相应的相关矩阵有一个块状结构,未知参数的数量从(d-1/2)/2减少到最多K(K+1/1)/2。我们提议基于Kendall等级相关性的强势算法,以确定集群,而不必假定K的先行知识或关于边际的任何信息,除非连续性。相应的块状结构估计值比K < d. 新的估计值在K= d. 等非结构化案例的有限样本中也效率高得多,尽管没有获得任意的收益。当数据分布为椭圆形时,其结果将延伸到线性关联矩阵和股票中的收益。