Machine learning models trained by different optimization algorithms under different data distributions can exhibit distinct generalization behaviors. In this paper, we analyze the generalization of models trained by noisy iterative algorithms. We derive distribution-dependent generalization bounds by connecting noisy iterative algorithms to additive noise channels found in communication and information theory. Our generalization bounds shed light on several applications, including differentially private stochastic gradient descent (DP-SGD), federated learning, and stochastic gradient Langevin dynamics (SGLD). We demonstrate our bounds through numerical experiments, showing that they can help understand recent empirical observations of the generalization phenomena of neural networks.
翻译:由不同数据分布下的不同优化算法所培训的机床学习模型可以展示不同的概括行为。 在本文中,我们分析了由吵闹的迭代算法所培训模型的概括性。我们通过将吵闹的迭代算法与通信和信息理论中发现的添加性噪声频道联系起来,得出基于分布的概括性界限。我们的概括性界限揭示了多种应用,包括有差别的私人随机梯度梯度下降(DP-SGD ) 、 联盟式学习, 和随机梯度兰格文动态(SGLD ) 。我们通过数字实验展示了我们的界限,表明它们能够帮助理解最近对神经网络一般现象的经验性观察。