In the current paper we provide constructive estimation of the convergence rate for training a known class of neural networks: the multi-class logistic regression. Despite several decades of successful use, our rigorous results appear new, reflective of the gap between practice and theory of machine learning. Training a neural network is typically done via variations of the gradient descent method. If a minimum of the loss function exists and gradient descent is used as the training method, we provide an expression that relates learning rate to the rate of convergence to the minimum. The method involves an estimate of the condition number of the Hessian of the loss function. We also discuss the existence of a minimum, as it is not automatic that a minimum exists. One method of ensuring convergence is by assigning positive probabiity to every class in the training dataset.
翻译:在本文中,我们对培训已知神经网络类别的趋同率提供了建设性的估计:多级后勤倒退。尽管我们成功使用了数十年,但我们的严格结果似乎是新的,反映了机器学习实践和理论之间的差距。培训神经网络通常是通过梯度下降法的变异进行。如果存在最低限度的损失函数,并且使用梯度下降法作为培训方法,我们提供一种表达方式,将学习率与最低程度趋同率联系起来。这种方法涉及对损失函数赫西安人条件数的估计。我们还讨论最低限度的存在,因为最低程度并非自动存在。确保趋同的一种方法是在培训数据集中为每一类人指定积极的正直。