We consider a sparse deep ReLU network (SDRN) estimator obtained from empirical risk minimization with a Lipschitz loss function in the presence of a large number of features. Our framework can be applied to a variety of regression and classification problems. The unknown target function to estimate is assumed to be in a Sobolev space with mixed derivatives. Functions in this space only need to satisfy a smoothness condition rather than having a compositional structure. We develop non-asymptotic excess risk bounds for our SDRN estimator. We further derive that the SDRN estimator can achieve the same minimax rate of estimation (up to logarithmic factors) as one-dimensional nonparametric regression when the dimension of the features is fixed, and the estimator has a suboptimal rate when the dimension grows with the sample size. We show that the depth and the total number of nodes and weights of the ReLU network need to grow as the sample size increases to ensure a good performance, and also investigate how fast they should increase with the sample size. These results provide an important theoretical guidance and basis for empirical studies by deep neural networks.
翻译:我们认为,从实验风险最小化中获得的稀薄深ReLU网络(SDRN)估计仪,在具备大量特征的情况下,具有Lipschitz损失的功能。我们的框架可以适用于各种回归和分类问题。估计的未知目标功能假定在一个具有混合衍生物的Sobolev空间中。在这个空间中,功能只需要满足一个平稳状态,而不是一个组成结构。我们为我们的SDRN估计仪开发了非非无药性超重风险界限。我们进一步发现,SDRN估计仪可以达到同样的最小估计速率(最高为对数系数),作为一维非参数固定时的不参数回归,当特性随着样本大小增长时,其估计值具有亚优劣率。我们表明,随着样本大小的增加,RELU网络的节点和重量的深度和总数需要增加,以确保良好的性能,并且还要调查它们与样本尺寸的增加速度如何。这些结果为深层神经网络的经验研究提供了重要的理论指导和基础。