Kernel methods are learning algorithms that enjoy solid theoretical foundations while suffering from important computational limitations. Sketching, that consists in looking for solutions among a subspace of reduced dimension, is a widely studied approach to alleviate this numerical burden. However, fast sketching strategies, such as non-adaptive subsampling, significantly degrade the guarantees of the algorithms, while theoretically-accurate sketches, such as the Gaussian one, turn out to remain relatively slow in practice. In this paper, we introduce the $p$-sparsified sketches, that combine the benefits from both approaches to achieve a good tradeoff between statistical accuracy and computational efficiency. To support our method, we derive excess risk bounds for both single and multiple output problems, with generic Lipschitz losses, providing new guarantees for a wide range of applications, from robust regression to multiple quantile regression. We also provide empirical evidences of the superiority of our sketches over recent SOTA approaches.
翻译:内核方法是一种学习算法,这些算法既具有坚实的理论基础,又受到重要的计算限制。 切入法,包括寻找一个低维子空间的解决方案,是一种广泛研究的减轻这个数字负担的方法。然而,快速草图战略,如非适应性子抽样,大大削弱了算法的保障,而理论上准确的草图,如高山一号,在实践上仍然相对缓慢。在本文中,我们引入了美元分解的草图,将两种方法的效益结合起来,在统计准确性和计算效率之间实现良好的平衡。为了支持我们的方法,我们为单项和多项产出问题,加上通用的Lipschitz损失,我们获得了过多的风险界限,为从强力回归到多量回归等广泛的应用提供了新的保障。我们还提供了我们草图优于最近的SOTA方法的经验证据。