The remarkable successes of neural networks in a huge variety of inverse problems have fueled their adoption in disciplines ranging from medical imaging to seismic analysis over the past decade. However, the high dimensionality of such inverse problems has simultaneously left current theory, which predicts that networks should scale exponentially in the dimension of the problem, unable to explain why the seemingly small networks used in these settings work as well as they do in practice. To reduce this gap between theory and practice, a general method for bounding the complexity required for a neural network to approximate a Lipschitz function on a high-dimensional set with a low-complexity structure is provided herein. The approach is based on the observation that the existence of a linear Johnson-Lindenstrauss embedding $\mathbf{A} \in \mathbb{R}^{d \times D}$ of a given high-dimensional set $\mathcal{S} \subset \mathbb{R}^D$ into a low dimensional cube $[-M,M]^d$ implies that for any Lipschitz function $f : \mathcal{S}\to \mathbb{R}^p$, there exists a Lipschitz function $g : [-M,M]^d \to \mathbb{R}^p$ such that $g(\mathbf{A}\mathbf{x}) = f(\mathbf{x})$ for all $\mathbf{x} \in \mathcal{S}$. Hence, if one has a neural network which approximates $g : [-M,M]^d \to \mathbb{R}^p$, then a layer can be added which implements the JL embedding $\mathbf{A}$ to obtain a neural network which approximates $f : \mathcal{S} \to \mathbb{R}^p$. By pairing JL embedding results along with results on approximation of Lipschitz functions by neural networks, one then obtains results which bound the complexity required for a neural network to approximate Lipschitz functions on high dimensional sets. The end result is a general theoretical framework which can then be used to better explain the observed empirical successes of smaller networks in a wider variety of inverse problems than current theory allows.
翻译:在过去十年里,神经网络在从医学成像到地震分析的学科中取得了显著的成功。然而,这种反向问题的高度多维性同时留下了当前的理论,该理论预测网络在问题的维度上应该成倍扩大,无法解释为什么在这些设置中使用的看似小型的网络以及它们的实际作用。为了缩小理论与实践之间的这一差距,一种将神经网络所需的复杂性绑定到一个高维的利普西茨函数上(R ⁇ D) 以低兼容性结构组成的高维设置。这个方法基于这样的观察:线性约翰逊-内向网络嵌入 $\mathb{A} 线性结果=mathb{mab} 常规结果 $\math 元 。通过一个高维的设置 $\math{S\tima} 电算框架可以将它与一个低维的 $x-M}M_dalmath 网络连接到一个这样的内部基功能: 一个直立端的机能 =max max max 。