Improving adversarial robustness of neural networks remains a major challenge. Fundamentally, training a neural network via gradient descent is a parameter estimation problem. In adaptive control, maintaining persistency of excitation (PoE) is integral to ensuring convergence of parameter estimates in dynamical systems to their true values. We show that parameter estimation with gradient descent can be modeled as a sampling of an adaptive linear time-varying continuous system. Leveraging this model, and with inspiration from Model-Reference Adaptive Control (MRAC), we prove a sufficient condition to constrain gradient descent updates to reference persistently excited trajectories converging to the true parameters. The sufficient condition is achieved when the learning rate is less than the inverse of the Lipschitz constant of the gradient of loss function. We provide an efficient technique for estimating the corresponding Lipschitz constant in practice using extreme value theory. Our experimental results in both standard and adversarial training illustrate that networks trained with the PoE-motivated learning rate schedule have similar clean accuracy but are significantly more robust to adversarial attacks than models trained using current state-of-the-art heuristics.
翻译:改善神经网络的对抗性强健性仍是一项重大挑战。 从根本上说,通过梯度下移对神经网络进行培训是一个参数估计问题。 在适应性控制中,保持刺激(PoE)对于确保动态系统中参数估计与其真实值的趋同不可或缺。我们表明,梯度下移的参数估计可以作为适应性线性线性时间变化连续系统的抽样模型进行模拟。我们利用这一模型,并在示范性参考适应控制(MRAC)的启发下,证明我们有足够的条件限制梯度下移更新,以提及持续兴奋的轨迹与真实值相融合。当学习率低于利普施茨损失函数的逆常数时,就有足够的条件实现了。我们提供了一种有效的技术,用以利用极端值理论估计相应的利普施茨实践常数。我们在标准培训和对抗性培训中得出的实验结果表明,受PoE动机学习率表培训的网络的准确性相似,但对于对抗性攻击的强度大大高于使用当前状态超常数模型。