The remarkable successes of neural networks in a huge variety of inverse problems have fueled their adoption in disciplines ranging from medical imaging to seismic analysis over the past decade. However, the high dimensionality of such inverse problems has simultaneously left current theory, which predicts that networks should scale exponentially in the dimension of the problem, unable to explain why the seemingly small networks used in these settings work as well as they do in practice. To reduce this gap between theory and practice, we provide a general method for bounding the complexity required for a neural network to approximate a H\"older (or uniformly) continuous function defined on a high-dimensional set with a low-complexity structure. The approach is based on the observation that the existence of a Johnson-Lindenstrauss embedding $A\in\mathbb{R}^{d\times D}$ of a given high-dimensional set $S\subset\mathbb{R}^D$ into a low dimensional cube $[-M,M]^d$ implies that for any H\"older (or uniformly) continuous function $f:S\to\mathbb{R}^p$, there exists a H\"older (or uniformly) continuous function $g:[-M,M]^d\to\mathbb{R}^p$ such that $g(Ax)=f(x)$ for all $x\in S$. Hence, if one has a neural network which approximates $g:[-M,M]^d\to\mathbb{R}^p$, then a layer can be added that implements the JL embedding $A$ to obtain a neural network that approximates $f:S\to\mathbb{R}^p$. By pairing JL embedding results along with results on approximation of H\"older (or uniformly) continuous functions by neural networks, one then obtains results which bound the complexity required for a neural network to approximate H\"older (or uniformly) continuous functions on high dimensional sets. The end result is a general theoretical framework which can then be used to better explain the observed empirical successes of smaller networks in a wider variety of inverse problems than current theory allows.
翻译:在过去十年里,神经网络在从医学成像到地震分析等不同学科中取得了显著的成功。然而,这种反向问题的高度多维性同时留下了当前理论,该理论预测,网络在问题所涉层面的规模应该指数化,无法解释为什么在这些设置中使用的看起来小的网络在理论和实践上起作用。为了缩小理论和实践之间的这一差距,我们提供了一个一般方法,将神经网络所需的复杂性绑到大约H\'older(或统一)持续功能,该功能在高维值的基值上定义一个H\'older(或统一)持续函数,该高维值结构。A\\\\\\\\\\\\\\\\\\\\\\时间,该方法基于观察,一个强度网络的存在 $S\sub\a\ ma\\\\\\\ dtime}一个给定的高维值设置 $(subbbbbbbb{D$),该功能可以持续运行的H\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ dy y ma\ dy y al a a lax a maxxxx a 直函数。</s>