Existing global convergence guarantees of (stochastic) gradient descent do not apply to practical deep networks in the practical regime of deep learning beyond the neural tangent kernel (NTK) regime. This paper proposes an algorithm, which is ensured to have global convergence guarantees in the practical regime beyond the NTK regime, under a verifiable condition called the expressivity condition. The expressivity condition is defined to be both data-dependent and architecture-dependent, which is the key property that makes our results applicable for practical settings beyond the NTK regime. On the one hand, the expressivity condition is theoretically proven to hold data-independently for fully-connected deep neural networks with narrow hidden layers and a single wide layer. On the other hand, the expressivity condition is numerically shown to hold data-dependently for deep (convolutional) ResNet with batch normalization with various standard image datasets. We also show that the the proposed algorithm has generalization performances comparable with those of the heuristic algorithm, with the same hyper-parameters and total number of iterations. Therefore, the proposed algorithm can be viewed as a step towards providing theoretical guarantees for deep learning in the practical regime.
翻译:现有(随机)梯度梯度下沉的现有全球趋同保证并不适用于实际深层次的深层次网络,而实际的深层学习制度则超出了神经核内核(NTK)制度,本文件提出一种算法,确保在NTK制度之外,在可核实的条件下,在NT制度之外,在实际制度中有全球趋同的保证; 表达性条件的定义是既取决于数据,又依赖结构,这是使我们的结果适用于NTK制度以外实际环境的关键属性; 一方面,在理论上,表达性条件证明数据是完全连接的深层内核网络所依赖的; 另一方面,从数字上看,表达性条件是以不同标准图像数据集分批标准化的深度(演进)ResNet所依赖的数据。 我们还表明,拟议的算法的概括性表现与超光度算算法的相同,具有相同的超比值和迭代数。因此,拟议的算法可以被视为在为深层学习提供理论保证方面迈出的一步。