The highly non-linear nature of deep neural networks causes them to be susceptible to adversarial examples and have unstable gradients which hinders interpretability. However, existing methods to solve these issues, such as adversarial training, are expensive and often sacrifice predictive accuracy. In this work, we consider curvature, which is a mathematical quantity which encodes the degree of non-linearity. Using this, we demonstrate low-curvature neural networks (LCNNs) that obtain drastically lower curvature than standard models while exhibiting similar predictive performance, which leads to improved robustness and stable gradients, with only a marginally increased training time. To achieve this, we minimize a data-independent upper bound on the curvature of a neural network, which decomposes overall curvature in terms of curvatures and slopes of its constituent layers. To efficiently minimize this bound, we introduce two novel architectural components: first, a non-linearity called centered-softplus that is a stable variant of the softplus non-linearity, and second, a Lipschitz-constrained batch normalization layer. Our experiments show that LCNNs have lower curvature, more stable gradients and increased off-the-shelf adversarial robustness when compared to their standard high-curvature counterparts, all without affecting predictive performance. Our approach is easy to use and can be readily incorporated into existing neural network models.
翻译:深神经网络的高度非线性性质导致它们易受对抗性例子的影响,并且具有不稳定的梯度,从而妨碍解释性。然而,现有的解决这些问题的方法,例如对抗性培训,费用昂贵,往往牺牲预测性准确性。在这项工作中,我们考虑到曲度,这是一个数学量,它能解码非线性的程度。我们利用它展示了两个新的建筑构件:第一,一个叫做非线性,其曲度大大低于标准模型,同时表现出类似的预测性能,这导致增强稳健性和稳定的梯度,而培训时间则稍稍增加。为了做到这一点,我们尽量减少一个依赖数据、在神经网络的曲度上上层的上限,这种曲线将整体曲度分解成非线性。为了有效地尽量减少这一界限,我们引入了两个新的建筑构件:第一,一个叫做中线性软性,这是软性和非线性的稳定变体,第二,一个软性弹性的内向性内向型,只有略性弹性的内置,为了做到这一点,我们实验显示在稳定、稳性、稳性、稳性、稳性、稳性、稳性、稳性、稳性、直对立性上性能上,在高的轨道上,我们较稳健的网络性能上性能上,在稳定、稳性能上,在高的直压性能上,在高的轨道性能上,在不稳性能上,在高的轨道性能上,在高的轨道性能上对等的轨道性能上。我们实验室性能上,我们试验显示所有高的轨道性能。