We provide a simple model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs. Prominent examples are the famous relu and convolutional neural networks together with their standard loss functions. Using the recent notion of conservative gradients, we then establish a "nonsmooth cheap gradient principle" for backpropagation encompassing most concrete applications. Nonsmooth backpropagation's cheapness contrasts with concurrent forward approaches which have, at this day, dimensional-dependent worst case estimates. In order to understand this class of methods, we relate the complexity of computing a large number of directional derivatives to that of matrix multiplication. This shows a fundamental limitation for improving forward AD for that task. Finally, while the fastest algorithms for computing a Clarke subgradient are linear in the dimension, it appears that computing two distinct Clarke (resp. lexicographic) subgradients for simple neural networks is NP-Hard.
翻译:我们提供了一个简单的模型来估计一系列非光学程序后向和前向的算法差异模式的计算成本。 突出的例子有著名的累进和进化神经网络及其标准损失功能。 我们利用最近保守梯度的概念, 然后为包含大多数具体应用的反向调整建立一个“ 隐mooth 廉价梯度原则 ” 。 非突进反向分析的廉价性与同时存在的前向方法形成对比, 后者在今天具有依赖维的最严重案例估计值。 为了理解这一类方法, 我们把计算大量定向衍生物的复杂性与矩阵乘法的复杂性联系起来。 这显示了改进这项任务前向自动调整的基本限制。 最后, 虽然计算克拉克次偏向的最快算法在维度上是线性的, 似乎计算两个不同的光谱( resp. 词法学) 子梯度用于简单的神经网络的计算是 NP-Hard 。