Neural kernels have drastically increased performance on diverse and nonstandard data modalities but require significantly more compute, which previously limited their application to smaller datasets. In this work, we address this by massively parallelizing their computation across many GPUs. We combine this with a distributed, preconditioned conjugate gradients algorithm to enable kernel regression at a large scale (i.e. up to five million examples). Using this approach, we study scaling laws of several neural kernels across many orders of magnitude for the CIFAR-5m dataset. Using data augmentation to expand the original CIFAR-10 training dataset by a factor of 20, we obtain a test accuracy of 91.2\% (SotA for a pure kernel method). Moreover, we explore neural kernels on other data modalities, obtaining results on protein and small molecule prediction tasks that are competitive with SotA methods.
翻译:神经内核在多种和非标准数据模式上的性能急剧提高,但需要更大幅度的计算,这以前将其应用限制在较小的数据集上。在这项工作中,我们通过在多个GPU中大规模平行计算来解决这个问题。我们将此与分布式的、有先决条件的同质梯度算法结合起来,以便能够大规模内核回归(即高达500万个实例)。我们采用这种方法,研究如何在CIFAR-5m数据集的许多级别上扩大若干神经内核的定律。我们利用数据扩增来将原CIFAR-10培训数据集扩大20倍,我们获得了91.2<unk> 的测试精度(纯内核法软件 ) 。 此外,我们探索其他数据模式的神经内核,获取蛋白质和小分子预测任务的结果,这些结果与SotA方法具有竞争力。</s>