We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank. Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms. We first develop a set of fast novel algorithms for rank estimation, column subset selection and least squares preconditioning. We then describe the design and implementation of leverage score estimators based on these primitives. These estimators are also effective for rank deficient input, which is frequently the case in data analytics applications. We provide detailed complexity analyses for all algorithms as well as meaningful approximation bounds and comparisons with the state-of-the-art. We conduct extensive numerical experiments to evaluate our algorithms and to illustrate their properties and performance using synthetic and real world data sets.
翻译:我们研究统计杠杆算法,以估计任意等级的矩形稠密或稀少矩阵的统计杠杆分数。我们的方法是,将分级披露方法与密集和分散随机的维度减少变异的构成相结合。我们首先开发一套快速的新算法,用于估计等级、分列子选择和最低方形先决条件。然后我们描述基于这些原始的杠杆估计算法的设计和实施。这些估计算法对于等级不完善的投入也是有效的,在数据分析应用中经常如此。我们为所有算法提供详细的复杂分析,并提供有意义的近似界限以及与最新技术的比较。我们进行了广泛的数字实验,以评价我们的算法,并利用合成和真实的世界数据集来说明其特性和性能。