In this article, we introduce the novel concept of samplets by transferring the construction of Tausch-White wavelets to the realm of data. This way we obtain a multilevel representation of discrete data which directly enables data compression, detection of singularities and adaptivity. Applying samplets to represent kernel matrices, as they arise in kernel based learning or Gaussian process regression, we end up with quasi-sparse matrices. By thresholding small entries, these matrices are compressible to O(N log N) relevant entries, where N is the number of data points. This feature allows for the use of fill-in reducing reorderings to obtain a sparse factorization of the compressed matrices. Besides the comprehensive introduction to samplets and their properties, we present extensive numerical studies to benchmark the approach. Our results demonstrate that samplets mark a considerable step in the direction of making large data sets accessible for analysis.
翻译:在本篇文章中,我们引入了通过将Tausch-White波子的构造转换到数据领域的新样本概念。 这样我们获得一个多层次的离散数据代表, 能够直接进行数据压缩、 检测奇点和适应性。 应用样本来代表内核矩阵, 当它们出现在内核学习或高斯进程回归中时, 我们最终会出现准偏差矩阵。 通过临界小条目, 这些矩阵可以压缩到O( Nlog N) 相关条目中, N是数据点的数量。 这个特征允许使用填充减少重新排序来获得压缩矩阵的稀薄因子化。 除了对样本及其特性的全面介绍外, 我们提出了广泛的数字研究来为方法基准。 我们的结果表明, 样本标志着在使大数据集可供分析的方向上迈出了相当长的一步。