Recent studies indicate that kernel machines can often perform similarly or better than deep neural networks (DNNs) on small datasets. The interest in kernel machines has been additionally bolstered by the discovery of their equivalence to wide neural networks in certain regimes. However, a key feature of DNNs is their ability to scale the model size and training data size independently, whereas in traditional kernel machines model size is tied to data size. Because of this coupling, scaling kernel machines to large data has been computationally challenging. In this paper, we provide a way forward for constructing large-scale general kernel models, which are a generalization of kernel machines that decouples the model and data, allowing training on large datasets. Specifically, we introduce EigenPro 3.0, an algorithm based on projected dual preconditioned SGD and show scaling to model and data sizes which have not been possible with existing kernel methods.
翻译:最近的研究显示,内核机器在小型数据集方面往往比深神经网络(DNNs)能发挥类似或更好的作用。对内核机器的兴趣又因在某些制度下发现它们与大神经网络的等同而得到进一步增强。然而,DNNs的一个关键特征是它们能够独立地扩大模型规模和培训数据规模,而在传统的内核机器模型中,数据规模与数据大小挂钩。由于这种结合,内核机器与大数据之间的比例在计算上一直具有挑战性。在本文中,我们为建立大型一般内核模型提供了一条前进的道路,这种模型是分离模型和数据的内核机器的统化,可以进行大型数据集的培训。具体地说,我们引入了EigenPro 3.0,一种基于预测的双重先决条件SGD的算法,并显示模型和数据大小的缩放,而这种模型和数据规模与现有的内核方法是不可能的。