Random Fourier Features (RFF) is among the most popular and broadly applicable approaches for scaling up kernel methods. In essence, RFF allows the user to avoid costly computations on a large kernel matrix via a fast randomized approximation. However, a pervasive difficulty in applying RFF is that the user does not know the actual error of the approximation, or how this error will propagate into downstream learning tasks. Up to now, the RFF literature has primarily dealt with these uncertainties using theoretical error bounds, but from a user's standpoint, such results are typically impractical -- either because they are highly conservative or involve unknown quantities. To tackle these general issues in a data-driven way, this paper develops a bootstrap approach to numerically estimate the errors of RFF approximations. Three key advantages of this approach are: (1) The error estimates are specific to the problem at hand, avoiding the pessimism of worst-case bounds. (2) The approach is flexible with respect to different uses of RFF, and can even estimate errors in downstream learning tasks. (3) The approach enables adaptive computation, so that the user can quickly inspect the error of a rough initial kernel approximation and then predict how much extra work is needed. Lastly, in exchange for all of these benefits, the error estimates can be obtained at a modest computational cost.
翻译:随机的 Fourier 地貌( RFF) 是推广内核方法最受欢迎和最广泛应用的方法之一。 本质上, RFF 允许用户通过快速随机近似,避免在大型内核矩阵上进行昂贵的计算。 但是,应用 RFF 的普遍困难是,用户不知道近似的实际错误,或误差如何传播到下游学习任务中。 到目前为止, RFF 文献主要使用理论错误界限处理这些不确定性,但从用户的观点看,这类结果通常不切实际 -- -- 要么因为它们非常保守,要么涉及数量不详。 为了以数据驱动的方式解决这些一般性问题,本文开发了一种从数字角度估计RFF 近似误差的靴套方法。 这种方法的三大优点是:(1) 误差估计数是手头问题所特有的,避免最坏情况界限的悲观。 (2) 这种方法对RFF 的不同用途十分灵活,甚至可以估计下游学习任务的错误。 (3) 这种方法使得用户能够进行适应性计算,因此用户可以快速检查粗略的初步估算结果的错误。