Deep learning (DL) has had unprecedented success and is now entering scientific computing with full force. However, DL suffers from a universal phenomenon: instability, despite universal approximating properties that often guarantee the existence of stable neural networks (NNs). We show the following paradox. There are basic well-conditioned problems in scientific computing where one can prove the existence of NNs with great approximation qualities, however, there does not exist any algorithm, even randomised, that can train (or compute) such a NN. Indeed, for any positive integers $K > 2$ and $L$, there are cases where simultaneously: (a) no randomised algorithm can compute a NN correct to $K$ digits with probability greater than $1/2$, (b) there exists a deterministic algorithm that computes a NN with $K-1$ correct digits, but any such (even randomised) algorithm needs arbitrarily many training data, (c) there exists a deterministic algorithm that computes a NN with $K-2$ correct digits using no more than $L$ training samples. These results provide basic foundations for Smale's 18th problem and imply a potentially vast, and crucial, classification theory describing conditions under which (stable) NNs with a given accuracy can be computed by an algorithm. We begin this theory by initiating a unified theory for compressed sensing and DL, leading to sufficient conditions for the existence of algorithms that compute stable NNs in inverse problems. We introduce Fast Iterative REstarted NETworks (FIRENETs), which we prove and numerically verify are stable. Moreover, we prove that only $\mathcal{O}(|\log(\epsilon)|)$ layers are needed for an $\epsilon$ accurate solution to the inverse problem (exponential convergence), and that the inner dimensions in the layers do not exceed the dimension of the inverse problem. Thus, FIRENETs are computationally very efficient.
翻译:深层学习( DL) 取得了前所未有的成功, 并且正在完全进入科学计算 。 然而, DL 存在普遍现象: 不稳定性, 尽管普遍接近性能常常保证稳定的神经网络的存在。 我们展示了以下悖论。 在科学计算中存在一些基本的、条件良好的问题, 从而可以证明存在具有巨大近似质量的NN( DL), 然而, 不存在任何能够( 或计算) 这样的 NNN 培训的算法, 甚至随机化算法。 事实上, 对于任何正数整数 $K > 2美元 和 $ $ 美元, 确实存在一些同时发生的现象:(a) 没有一种随机化的算法能够将NNR的正确度折成$ 数字, 概率大于1/2美元, (b) 存在一种确定性算法, 但是任何这样的( 甚至随机化) 算法都需要任意的很多培训数据, (c) 存在一种确定性算法, 用美元- 2 正确的位数计算方法来计算, 使用不超过 $ $ 美元 培训的 。 这些结果可以证明一个稳定的Onal ral deal dalational 的准确性 。