Existing global convergence guarantees of (stochastic) gradient descent do not apply to practical deep networks in the practical regime of deep learning beyond the neural tangent kernel (NTK) regime. This paper proposes an algorithm, which is ensured to have global convergence guarantees in the practical regime beyond the NTK regime, under a verifiable condition called the expressivity condition. The expressivity condition is defined to be both data-dependent and architecture-dependent, which is the key property that makes our results applicable for practical settings beyond the NTK regime. On the one hand, the expressivity condition is theoretically proven to hold data-independently for fully-connected deep neural networks with narrow hidden layers and a single wide layer. On the other hand, the expressivity condition is numerically shown to hold data-dependently for deep (convolutional) ResNet with batch normalization with various standard image datasets. We also show that the proposed algorithm has generalization performances comparable with those of the heuristic algorithm, with the same hyper-parameters and total number of iterations. Therefore, the proposed algorithm can be viewed as a step towards providing theoretical guarantees for deep learning in the practical regime.
翻译:现有(随机)梯度梯度下沉的现有全球趋同保证并不适用于实际深层次的深层次网络,而实际的深层学习制度超出了神经相干内核(NTK)制度,本文件提出一种算法,确保在NTK制度之外,在可核实的条件下,在NT制度之外,在实际制度中有全球趋同的保证; 表达性条件的定义是既取决于数据,又依赖结构,这是使我们的结果适用于NTK制度以外实际环境的关键属性; 一方面,从理论上看,表达性条件证明,数据是完全连接的深层神经网络所依赖的,其隐藏层和单一的宽层。 另一方面,从数字上看,表达性条件是以数据为依存,与不同标准图像数据集的分批正常化同时保持数据(演进)ResNet。 我们还表明,拟议的算法的概括性表现与超光度算算法的性能相当,具有同样的超度参数和迭数。因此,拟议的算法可被视为在为深层实际学习制度提供理论保证方面迈出的一步。