We provide sparse principal loading analysis which is a new concept that reduces dimensionality of cross sectional data and identifies the underlying covariance structure. Sparse principal loading analysis selects a subset of existing variables for dimensionality reduction while variables that have a small distorting effect on the covariance matrix are discarded. Therefore, we show how to detect these variables and provide methods to assess their magnitude of distortion. Sparse principal loading analysis is twofold and can also identify the underlying block diagonal covariance structure using sparse loadings. This is a new approach in this context and we provide a required criterion to evaluate if the found block-structure fits the sample. The method uses sparse loadings rather than eigenvectors to decompose the covariance matrix which can result in a large loss of information if the loadings of choice are too sparse. However, we show that this is no concern in our new concept because sparseness is controlled by the aforementioned evaluation criterion. Further, we show the advantages of sparse principal loading analysis both in the context of variable selection and covariance structure detection, and illustrate the performance of the method with simulations and on real datasets. Supplementary material for this article is available online.
翻译:我们提供了稀少的主要装载分析,这是一个新概念,可以减少跨区段数据的维度,并查明潜在的共差结构。粗略的主要装载分析选择了现有减少维度变数的子集,而忽略了对共差矩阵有小扭曲影响的变数。因此,我们展示了如何检测这些变数,并提供了评估其扭曲程度的方法。粗略的主要装载分析是双重的,还可以使用稀薄的负荷来确定基本区块的对角共差结构。这是这方面的一种新办法,我们提供了评估发现的区块结构是否适合样本所需的标准。这种方法使用稀疏的装载,而不是源生化的变数矩阵进行分解,如果选择的加载过少,则可能导致大量信息丢失。然而,我们表明,由于上述评价标准控制着稀少性,所以我们的新概念没有考虑到这一点。此外,我们展示了在变量选择和共差结构检测中缺乏的原始装数分析的优点,并用模拟和真实的数据集来说明方法的绩效。