A recurrent neural network (RNN) is a widely used deep-learning network for dealing with sequential data. Imitating a dynamical system, an infinite-width RNN can approximate any open dynamical system in a compact domain. In general, deep networks with bounded widths are more effective than wide networks in practice; however, the universal approximation theorem for deep narrow structures has yet to be extensively studied. In this study, we prove the universality of deep narrow RNNs and show that the upper bound of the minimum width for universality can be independent of the length of the data. Specifically, we show that a deep RNN with ReLU activation can approximate any continuous function or $L^p$ function with the widths $d_x+d_y+2$ and $\max\{d_x+1,d_y\}$, respectively, where the target function maps a finite sequence of vectors in $\mathbb{R}^{d_x}$ to a finite sequence of vectors in $\mathbb{R}^{d_y}$. We also compute the additional width required if the activation function is $\tanh$ or more. In addition, we prove the universality of other recurrent networks, such as bidirectional RNNs. Bridging a multi-layer perceptron and an RNN, our theory and proof technique can be an initial step toward further research on deep RNNs.
翻译:经常性神经网络( RNN) 是用于处理连续数据的常用深层学习网络 。 模拟一个动态系统, 无限宽的 RNNN 可以在一个紧凑的域域中近似任何开放的动态系统。 一般来说, 带条宽度的深网络比宽度的网络更有效; 但是, 深窄结构的通用近距离理论尚未广泛研究 。 在这项研究中, 我们证明深窄的 RNNN 的普遍性, 并显示普遍性最小宽度的上层界限可以独立于数据长度 。 具体地说, 我们显示, 一个带有雷卢激活的深层 RNNNNN 可以在任何连续的函数或$$L+_y+2 和$maxd_x+1, y_ $maxd_ $, 其中目标函数绘制了以 $mathb{ Rd_x] 的矢量的有限序列。 我们还可以在初始的RNNNNW 上配置额外的宽度, 如果启动的理论功能是更直接的RNF 。