This article considers "compressive learning," an approach to large-scale machine learning where datasets are massively compressed before learning (e.g., clustering, classification, or regression) is performed. In particular, a "sketch" is first constructed by computing carefully chosen nonlinear random features (e.g., random Fourier features) and averaging them over the whole dataset. Parameters are then learned from the sketch, without access to the original dataset. This article surveys the current state-of-the-art in compressive learning, including the main concepts and algorithms, their connections with established signal-processing methods, existing theoretical guarantees -- on both information preservation and privacy preservation, and important open problems.
翻译:本文将“ 压缩学习” 视为大规模机器学习的一种方法, 即数据库在学习前被大规模压缩( 如集群、 分类或回归) 。 特别是, “ 缓冲” 首先通过计算精心选择的非线性随机特征( 例如随机的 Fourier 特征) 来构建, 并在整个数据集中平均使用。 然后从草图中学习参数, 无法访问原始数据集 。 本条将调查压缩学习的当前最新水平, 包括主要概念和算法、 与既定信号处理方法的联系、 现有的理论保障 -- -- 信息保存和隐私保护, 以及重要的公开问题 。