We consider a family of deep neural networks consisting of two groups of convolutional layers, a downsampling operator, and a fully connected layer. The network structure depends on two structural parameters which determine the numbers of convolutional layers and the width of the fully connected layer. We establish an approximation theory with explicit approximation rates when the approximated function takes a composite form $f\circ Q$ with a feature polynomial $Q$ and a univariate function $f$. In particular, we prove that such a network can outperform fully connected shallow networks in approximating radial functions with $Q(x) =|x|^2$, when the dimension $d$ of data from $\mathbb{R}^d$ is large. This gives the first rigorous proof for the superiority of deep convolutional neural networks in approximating functions with special structures. Then we carry out generalization analysis for empirical risk minimization with such a deep network in a regression framework with the regression function of the form $f\circ Q$. Our network structure which does not use any composite information or the functions $Q$ and $f$ can automatically extract features and make use of the composite nature of the regression function via tuning the structural parameters. Our analysis provides an error bound which decreases with the network depth to a minimum and then increases, verifying theoretically a trade-off phenomenon observed for network depths in many practical applications.
翻译:我们考虑的是由两组相联层组成的深神经网络组成的一个由两组相联层组成的深神经网络,一个下取样操作员和一个完全连接的层。网络结构取决于两个结构参数,这两个结构参数决定了相联层的数量和完全连接层的宽度。当近似函数采用一个复合形式$f\cir Q$, 带有一个特征的多元面值Q$和一个单面函数$f$。特别是,我们证明这样一个网络可以超越以$(x) ⁇ x%2美元相对接合的辐射函数完全连接的浅层网络。当$\mathbb{R ⁇ d$的数据的维度很大时,我们建立近似理论,以明确的近似速率建立一个明确的近似理论。这为在与特殊结构相近的功能中深相联神经网络的优越性提供了第一个有力的证据。然后,我们通过一个包含美元格式回归功能的实际网络进行一般化分析,在这样一个深层网络中进行经验风险最小化分析。 我们的网络结构结构结构结构结构,可以不使用任何复合的深度数据深度数据精确度分析,通过美元来自动地分析。