An increasing number of systems are being designed by gathering significant amounts of data and then optimizing the system parameters directly using the obtained data. Often this is done without analyzing the dataset structure. As task complexity, data size, and parameters all increase to millions or even billions, data summarization is becoming a major challenge. In this work, we investigate data summarization via dictionary learning~(DL), leveraging the properties of recently introduced non-negative kernel regression (NNK) graphs. Our proposed NNK-Means, unlike previous DL techniques, such as kSVD, learns geometric dictionaries with atoms that are representative of the input data space. Experiments show that summarization using NNK-Means can provide better class separation compared to linear and kernel versions of kMeans and kSVD. Moreover, NNK-Means is scalable, with runtime complexity similar to that of kMeans.
翻译:通过收集大量数据,然后直接利用获得的数据优化系统参数,正在设计越来越多的系统。这往往不分析数据集结构。随着任务的复杂性、数据大小和参数都增加到数百万甚至数十亿,数据总和正在成为一个重大挑战。在这项工作中,我们通过字典学习~(DL),利用最近引入的非负内核回归图(NNK-Means)的特性,对数据进行数据总分类调查。我们提议的NNNK-Means与先前的DL技术(如KSVD)不同,用代表输入数据空间的原子来学习几何词典。实验显示,使用NNNK-Means的合成可以提供更好的分类分级,与KMeans和KSVD的线性和内核版本相比。此外,NNK-Means具有可扩展性,运行时间的复杂性类似于 kMeans。