Kernel mean embeddings are a powerful tool to represent probability distributions over arbitrary spaces as single points in a Hilbert space. Yet, the cost of computing and storing such embeddings prohibits their direct use in large-scale settings. We propose an efficient approximation procedure based on the Nystr\"om method, which exploits a small random subset of the dataset. Our main result is an upper bound on the approximation error of this procedure. It yields sufficient conditions on the subsample size to obtain the standard $n^{-1/2}$ rate while reducing computational costs. We discuss applications of this result for the approximation of the maximum mean discrepancy and quadrature rules, and illustrate our theoretical findings with numerical experiments.
翻译:内核平均嵌入是一个强大的工具,可以代表任意空间的概率分布,作为Hilbert空间的单点。然而,计算和储存这种嵌入的成本无法直接用于大规模环境。我们建议基于Nystr\'om 方法的高效近似程序,该方法利用数据集中的一个小随机子集。我们的主要结果就是这个程序的近似误差的上限。它在子抽样大小上设定了足够的条件,以便在降低计算成本的同时获得标准 $ ⁇ -1/2$的费率。我们讨论了这一结果的应用,以对最大平均值差异和等式规则的近似,并以数字实验来说明我们的理论结论。