We study approaches for compressing the empirical measure in the context of finite dimensional reproducing kernel Hilbert spaces (RKHSs).In this context, the empirical measure is contained within a natural convex set and can be approximated using convex optimization methods. Such an approximation gives under certain conditions rise to a coreset of data points. A key quantity that controls how large such a coreset has to be is the size of the largest ball around the empirical measure that is contained within the empirical convex set. The bulk of our work is concerned with deriving high probability lower bounds on the size of such a ball under various conditions. We complement this derivation of the lower bound by developing techniques that allow us to apply the compression approach to concrete inference problems such as kernel ridge regression. We conclude with a construction of an infinite dimensional RKHS for which the compression is poor, highlighting some of the difficulties one faces when trying to move to infinite dimensional RKHSs.
翻译:我们研究在有限维度再生产内核Hilbert空间(RKHSs)背景下压缩实测尺度的方法。 在这方面,实测尺度包含在自然锥形内,并且可以使用锥形优化方法进行近似。这种近似在某些条件下会产生一组核心数据点。一个关键数量控制这种核心部分的大小必须是实测锥体内所包含实测尺度上的最大球体的大小。我们的大部分工作涉及在不同条件下得出这种球体大小的高概率较低界限。我们通过开发技术来补充这种较低界限的产出,这些技术使我们能够应用压缩法来具体推断内核脊回归等问题。我们最后用一个无限维的RKHS模型来计算,而这种模型的压缩很差,突出了试图移动到无限维的RKHS时所面临的一些困难。