An important problem in large scale inference is the identification of variables that have large correlations or partial correlations. Recent work has yielded breakthroughs in the ultra-high dimensional setting when the sample size $n$ is fixed and the dimension $p \rightarrow \infty$ ([Hero, Rajaratnam 2011, 2012]). Despite these advances, the correlation screening framework suffers from some serious practical, methodological and theoretical deficiencies. For instance, theoretical safeguards for partial correlation screening requires that the population covariance matrix be block diagonal. This block sparsity assumption is however highly restrictive in numerous practical applications. As a second example, results for correlation and partial correlation screening framework requires the estimation of dependence measures or functionals, which can be highly prohibitive computationally. In this paper, we propose a unifying approach to correlation and partial correlation mining which specifically goes beyond the block diagonal correlation structure, thus yielding a methodology that is suitable for modern applications. By making connections to random geometric graphs, the number of highly correlated or partial correlated variables are shown to have novel compound Poisson finite-sample characterizations, which hold for both the finite $p$ case and when $p \rightarrow \infty$. The unifying framework also demonstrates an important duality between correlation and partial correlation screening with important theoretical and practical consequences.
翻译:在大规模推断中,一个重要问题是确定具有大量关联或部分关联的变量。最近的工作在特高维环境中取得了突破,因为样本规模为美元,且尺寸为美元([Hero, Rajaratratnam, 2011, 2012])。尽管取得了这些进步,但相关筛选框架存在一些严重的实际、方法和理论缺陷。例如,部分关联筛选的理论保障要求人口相通矩阵是块形对数的。这一块宽度假设在许多实际应用中具有高度限制性。第二个例子是,在相关和部分关联筛选框架的结果中,需要估算依赖度措施或功能,而其计算能力可能非常令人无法接受。在本文件中,我们建议对相关性和部分关联采矿采取统一的方法,具体超越了区块对等相关性结构,从而产生一种适合现代应用的方法。通过与随机的几何图表挂钩,显示高度关联或部分关联变量的数量在许多实际应用中具有新颖的化合物Poisson 定基质定性和两极分正值(美元)的理论性分析结果。当确定定数和双向正数的精确度/正正正正对基)之间,这也显示一个重要对应性案例。