We present sketched linear discriminant analysis, an iterative randomized approach to binary-class Gaussian model linear discriminant analysis (LDA) for very large data. We harness a least squares formulation and mobilize the stochastic gradient descent framework. Therefore, we obtain a randomized classifier with performance that is very comparable to that of full data LDA while requiring access to only one row of the training data at a time. We present convergence guarantees for the sketched predictions on new data within a fixed number of iterations. These guarantees account for both the Gaussian modeling assumptions on the data and algorithmic randomness from the sketching procedure. Finally, we demonstrate performance with varying step-sizes and numbers of iterations. Our numerical experiments demonstrate that sketched LDA can offer a very viable alternative to full data LDA when the data may be too large for full data analysis.
翻译:我们提出了草图线性分辨分析,这是对大量数据的高斯模型线性分辨分析(LDA)的迭接随机方法。我们采用了最小方形的配方,并调动了随机梯度基底框架。因此,我们获得了一个随机分类器,其性能与完整数据LDA非常相似,同时只要求一次访问一行培训数据。我们保证了对固定迭代数内新数据的草图预测的趋同性。这些担保包括了高斯模型的假设,以及素描程序中的数据和算法随机性。最后,我们展示了不同级次尺寸和迭代数的性能。我们的数字实验表明,在数据可能太大而无法进行全面数据分析时,草图LDA可以提供一种非常可行的替代全部数据LDA的替代方法。