We solve an open question from Lu et al. (2017), by showing that any target network with inputs in $\mathbb{R}^d$ can be approximated by a width $O(d)$ network (independent of the target network's architecture), whose number of parameters is essentially larger only by a linear factor. In light of previous depth separation theorems, which imply that a similar result cannot hold when the roles of width and depth are interchanged, it follows that depth plays a more significant role than width in the expressive power of neural networks. We extend our results to constructing networks with bounded weights, and to constructing networks with width at most $d+2$, which is close to the minimal possible width due to previous lower bounds. Both of these constructions cause an extra polynomial factor in the number of parameters over the target network. We also show an exact representation of wide and shallow networks using deep and narrow networks which, in certain cases, does not increase the number of parameters over the target network.
翻译:我们解决了Lu等人(2017年)提出的一个未决问题,即显示任何投入为$mathbb{R ⁇ d$的目标网络都可以被一个宽度为$(d)$网络(独立于目标网络架构)的网络(其参数数量基本上只有线性系数较大)所近似。鉴于先前的深度分离定理,这意味着在宽度和深度作用交接时,类似的结果无法维持,因此深度的作用比神经网络表达力的宽度要大得多。我们把结果推广到有约束重量的网络,以及宽度为$d+2$的网络,因为远近于以前较低界限导致的最低宽度。这两种构造在目标网络的参数数量上都造成了一个额外的多数值因素。我们还用深窄的网络来显示宽度和浅度网络的确切代表性,而在某些情况下,这些网络并不增加目标网络的参数数量。