This work studies the behavior of neural networks trained with the logistic loss via gradient descent on binary classification data where the underlying data distribution is general, and the (optimal) Bayes risk is not necessarily zero. In this setting, it is shown that gradient descent with early stopping achieves population risk arbitrarily close to optimal in terms of not just logistic and misclassification losses, but also in terms of calibration, meaning the sigmoid mapping of its outputs approximates the true underlying conditional distribution arbitrarily finely. Moreover, the necessary iteration, sample, and architectural complexities of this analysis all scale naturally with a certain complexity measure of the true conditional model. Lastly, while it is not shown that early stopping is necessary, it is shown that any univariate classifier satisfying a local interpolation property is necessarily inconsistent.
翻译:这项工作对通过二元分类数据中的梯度下降导致物流损失而受过培训的神经网络的行为进行了研究,基本数据分布为一般数据,而(最理想的)贝雅人风险不一定为零。 在这种背景下,早期停止的梯度下降不仅在后勤损失和分类错误损失方面,而且在校准方面,都可任意地实现与人口高度接近的最佳风险,也就是说,其产出的类组图绘制与真实的有条件分布相近。此外,这一分析所有规模的必要迭代、抽样和建筑复杂性自然都与真正有条件模型的某种复杂度相适应。 最后,虽然不能证明早期停用是必要的,但可以证明任何能够满足本地内分解特性的单向分类者必然是不一致的。