Many applications benefit from theory relevant to the identification of variables having large correlations or partial correlations in high dimension. Recently there has been progress in the ultra-high dimensional setting when the sample size $n$ is fixed and the dimension $p$ tends to infinity. Despite these advances, the correlation screening framework suffers from practical, methodological and theoretical deficiencies. For instance, previous correlation screening theory requires that the population covariance matrix be sparse and block diagonal. This block sparsity assumption is however restrictive in practical applications. As a second example, correlation and partial correlation screening requires the estimation of dependence measures, which can be computationally prohibitive. In this paper, we propose a unifying approach to correlation and partial correlation mining that is not restricted to block diagonal correlation structure, thus yielding a methodology that is suitable for modern applications. By making connections to random geometric graphs, the number of highly correlated or partial correlated variables are shown to have compound Poisson finite-sample characterizations, which hold for both the finite $p$ case and when $p$ tends to infinity. The unifying framework also demonstrates a duality between correlation and partial correlation screening with theoretical and practical consequences.
翻译:许多应用都受益于与确定具有高度相关性或部分相关性的变量有关的理论。最近,当样本规模固定时,特高维环境在固定美元和维度倾向于无限时,在超高维环境方面取得了进展。尽管取得了这些进步,但相关筛选框架存在实际、方法和理论方面的缺陷。例如,以前的相关筛选理论要求人口共变矩阵稀少,并形成分层对立。这种块块宽度假设在实际应用中是限制性的。作为第二个例子,相关和部分相关性筛选要求估算依赖度措施,而这些措施可能无法计算。在本文中,我们建议对相关性和部分相关开采采取统一的方法,该方法不局限于阻断对等相关结构的结构,从而产生一种适合现代应用的方法。通过与随机的几何图形挂钩,显示高度相关或部分相关变量的数量具有复合Poisson定值抽样特征,这既属于限定值美元案例,又往往具有确定值值值的美元。统一框架还显示了与理论和后果的双向关系和部分关联性。