While large training datasets generally offer improvement in model performance, the training process becomes computationally expensive and time consuming. Distributed learning is a common strategy to reduce the overall training time by exploiting multiple computing devices. Recently, it has been observed in the single machine setting that overparametrization is essential for benign overfitting in ridgeless regression in Hilbert spaces. We show that in this regime, data splitting has a regularizing effect, hence improving statistical performance and computational complexity at the same time. We further provide a unified framework that allows to analyze both the finite and infinite dimensional setting. We numerically demonstrate the effect of different model parameters.
翻译:虽然大型培训数据集通常能改善模型性能,但培训过程在计算上变得昂贵和耗时。分散学习是通过利用多种计算装置减少总体培训时间的共同战略。最近,在单一机器环境中观察到,过度平衡对于在希尔伯特空域无脊椎回归中进行无害的过度调整至关重要。我们表明,在这一制度下,数据分离具有常规化效果,从而同时改善统计性能和计算复杂性。我们进一步提供了一个统一框架,可以分析有限和无限的维度设置。我们用数字方式展示了不同模型参数的效果。