We adopt an information-theoretic framework to analyze the generalization behavior of the class of iterative, noisy learning algorithms. This class is particularly suitable for study under information-theoretic metrics as the algorithms are inherently randomized, and it includes commonly used algorithms such as Stochastic Gradient Langevin Dynamics (SGLD). Herein, we use the maximal leakage (equivalently, the Sibson mutual information of order infinity) metric, as it is simple to analyze, and it implies both bounds on the probability of having a large generalization error and on its expected value. We show that, if the update function (e.g., gradient) is bounded in $L_2$-norm, then adding isotropic Gaussian noise leads to optimal generalization bounds: indeed, the input and output of the learning algorithm in this case are asymptotically statistically independent. Furthermore, we demonstrate how the assumptions on the update function affect the optimal (in the sense of minimizing the induced maximal leakage) choice of the noise. Finally, we compute explicit tight upper bounds on the induced maximal leakage for several scenarios of interest.
翻译:我们采用了一个信息理论框架来分析迭代、噪音学习算法类的概括行为。 这一类特别适合在信息理论度量仪下进行研究,因为算法本质上是随机的,它包括通常使用的算法,例如Stochastic Gradient Langevin Dynamics(SGLDD) 。 我们在此使用最大渗漏(相当于Sibson 秩序无限的相互信息) 度量,因为它是简单的分析,它意味着大范围概括误差的可能性及其预期值的界限。 我们显示,如果更新功能(如梯度)以$_2美元- 诺姆为界,然后添加等异调高斯音导致最佳的概括界限。 的确,本案学习算法的投入和输出在统计上是完全独立的。 此外,我们演示了更新功能的假设会如何影响噪音的最佳选择(从尽量减少引致最大渗漏的意义上说 ) 。 最后, 我们对导出的最大渗漏的一些假设方案进行了明确的上下拉紧的界限。</s>