It is often desirable to reduce the dimensionality of a large dataset by projecting it onto a low-dimensional subspace. Matrix sketching has emerged as a powerful technique for performing such dimensionality reduction very efficiently. Even though there is an extensive literature on the worst-case performance of sketching, existing guarantees are typically very different from what is observed in practice. We exploit recent developments in the spectral analysis of random matrices to develop novel techniques that provide provably accurate expressions for the expected value of random projection matrices obtained via sketching. These expressions can be used to characterize the performance of dimensionality reduction in a variety of common machine learning tasks, ranging from low-rank approximation to iterative stochastic optimization. Our results apply to several popular sketching methods, including Gaussian and Rademacher sketches, and they enable precise analysis of these methods in terms of spectral properties of the data. Empirical results show that the expressions we derive reflect the practical performance of these sketching methods, down to lower-order effects and even constant factors.
翻译:通过将一个大型数据集投射到一个低维次空间,往往有必要降低大型数据集的维度。 矩阵草图已经成为一种高效进行这种维度降低的强大技术。 尽管有大量关于最坏的草图性能的文献,但现有的保障通常与实际所观察到的情况大不相同。 我们利用随机矩阵光谱分析的最新发展开发出新技术,为通过草图获得的随机投影矩阵的预期值提供可辨别准确的表达方式。这些表达方式可用于描述从低端近距离到迭接切优化等各种共同机器学习任务中维度下降的性能。我们的结果适用于包括高山和雷德马赫草图在内的一些流行的草图方法,这些方法有助于精确分析数据的光谱特性。 实证结果显示,我们得出的表达方式反映了这些草图性方法的实际表现,降至较低顺序的影响,甚至恒定因素。