We describe a general framework -- compressive statistical learning -- for resource-efficient large-scale learning: the training collection is compressed in one pass into a low-dimensional sketch (a vector of random empirical generalized moments) that captures the information relevant to the considered learning task. A near-minimizer of the risk is computed from the sketch through the solution of a nonlinear least squares problem. We investigate sufficient sketch sizes to control the generalization error of this procedure. The framework is illustrated on compressive PCA, compressive clustering, and compressive Gaussian mixture Modeling with fixed known variance. The latter two are further developed in a companion paper.
翻译:我们描述一个总体框架 -- -- 压缩统计学习 -- -- 用于资源节约型大规模学习:培训收集在一个插口中压缩成一个低维草图(随机经验一般瞬间矢量),以捕捉与深思熟虑的学习任务有关的信息;通过解决非线性最小方块问题,从草图中计算出近乎最小的风险最小值;我们调查足够的草图大小,以控制这一程序的一般错误。框架以压缩五氯苯甲醚、压缩组合和压缩高斯混合模型为例,并附有固定已知差异。后两个框架在一份配套文件中得到进一步发展。