In recent studies, the generalization properties for distributed learning and random features assumed the existence of the target concept over the hypothesis space. However, this strict condition is not applicable to the more common non-attainable case. In this paper, using refined proof techniques, we first extend the optimal rates for distributed learning with random features to the non-attainable case. Then, we reduce the number of required random features via data-dependent generating strategy, and improve the allowed number of partitions with additional unlabeled data. Theoretical analysis shows these techniques remarkably reduce computational cost while preserving the optimal generalization accuracy under standard assumptions. Finally, we conduct several experiments on both simulated and real-world datasets, and the empirical results validate our theoretical findings.
翻译:在最近的研究中,分布式学习的一般特性和随机特性假定了在假设空间上存在目标概念,然而,这一严格条件不适用于更常见的无法实现的情况。在本文中,我们首先使用改良的验证技术,将随机特性的最佳分布式学习率扩大到无法实现的情况。然后,我们通过以数据为基础的生成战略,减少所需随机特性的数量,并增加允许的分区数量,增加未加标记的数据。理论分析表明,这些技术大大降低了计算成本,同时保持了标准假设下的最佳一般化准确性。最后,我们在模拟和现实世界数据集方面进行了几项实验,实验结果证实了我们的理论结论。