Having similar behavior at training time and test time $-$ what we call a "What You See Is What You Get" (WYSIWYG) property $-$ is desirable in machine learning. Models trained with standard stochastic gradient descent (SGD), however, do not necessarily have this property, as their complex behaviors such as robustness or subgroup performance can differ drastically between training and test time. In contrast, we show that Differentially-Private (DP) training provably ensures the high-level WYSIWYG property, which we quantify using a notion of distributional generalization. Applying this connection, we introduce new conceptual tools for designing deep-learning methods by reducing generalization concerns to optimization ones: to mitigate unwanted behavior at test time, it is provably sufficient to mitigate this behavior on the training data. By applying this novel design principle, which bypasses "pathologies" of SGD, we construct simple algorithms that are competitive with SOTA in several distributional-robustness applications, significantly improve the privacy vs. disparate impact trade-off of DP-SGD, and mitigate robust overfitting in adversarial training. Finally, we also improve on theoretical bounds relating DP, stability, and distributional generalization.
翻译:在培训时间和测试时间有类似行为,类似在培训时间和测试时间有类似行为,美元-美元,我们称之为WYSIWYG的“你看到什么就能得到什么”(WYSIWYG)财产,这是机器学习中可取的。在机器学习中,需要用机器学习用美元(WYSIWYG)财产(WYSIWYG) 美元来设计深学习方法。不过,经过标准随机偏心梯梯梯梯度下降(SGD)的模型不一定具有这种特性,因为其复杂的行为,例如稳健或分组性能,在培训时间和测试时间之间可能有很大差异。相比之下,我们表明差异-私人(DP)培训可以确保高层次的WYSIWYG(DP)培训(DP-SGWYG)财产,我们使用分布式通用概念量化这些财产。应用这一联系,我们引入了新的概念工具来设计深学习方法,通过减少对优化模式的普遍关注来设计深学习方法:为了减少试验时间的不想要的行为,这足以减轻不想要的行为,因为在试验时间上减少不想要的行为,因此,这足以减轻不必要的行为,这足以减轻培训数据中的行为。通过应用这个设计原则,通过绕SGD-SG-SGD的这个设计原则,通过绕SG的“病“病症”的“病症”的“病症”,通过应用这个设计原则,我们建立,我们建立与SOT,我们建立与SOT,我们改善,我们改进了稳定,我们改善隐私,我们改善,我们改进了对D的、改进了对D的理论分配的一般分配的、改进了对D的理论分配分配分配分配分配分配的、改进了稳定,还的、改进了对D的严格的、改进了方向,还的、改进了方向的弹性分配分配分配的、改进了,也的、改进了,还的、改进了,也改进了,还的、改进了,还改进了,也降低了,以及一般的,改进了,改进了。