This paper investigates the stability of deep ReLU neural networks for nonparametric regression under the assumption that the noise has only a finite p-th moment. We unveil how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions with hierarchical composition structure when both the adaptive Huber loss and deep ReLU neural networks are used. This optimal rate of convergence cannot be obtained by the ordinary least squares but can be achieved by the Huber loss with a properly chosen parameter that adapts to the sample size, smoothness, and moment parameters. A concentration inequality for the adaptive Huber ReLU neural network estimators with allowable optimization errors is also derived. To establish a matching lower bound within the class of neural network estimators using the Huber loss, we employ a different strategy from the traditional route: constructing a deep ReLU network estimator that has a better empirical loss than the true function and the difference between these two functions furnishes a low bound. This step is related to the Huberization bias, yet more critically to the approximability of deep ReLU networks. As a result, we also contribute some new results on the approximation theory of deep ReLU neural networks.
翻译:本文调查了深 ReLU 神经网络的非对称回归的稳定性, 假设噪音只有有限的pth 时刻。 当使用适应性Huber 损失和深 ReLU 神经网络时, 这个最佳的趋同率无法由普通最小方位获得, 却可以通过Huber 损失实现, 由适合样本大小、 平滑度和瞬间参数的正确选择参数实现。 我们揭示了最佳趋同率的最佳速度取决于 p、 平滑度和内在维度, 当使用 Huber 损失时, 一个非对称性回归函数与等级结构结构结构结构结构结构结构相匹配时, 我们采用与传统路径不同的策略: 构建一个比真实功能更具有经验性损失的深度ReLU 网络估计器, 以及这两个功能之间的差异, 低约束度。 这个步骤与 Huber 偏差有关, 但对于适应性 Huber ReLU 网络中可允许优化错误的测量器, 也得出了最佳的浓度不平等性。 为了在使用Huber Expromal 网络的顶点结果, 。