Neural networks can be trained to solve regression problems by using gradient-based methods to minimize the square loss. However, practitioners often prefer to reformulate regression as a classification problem, observing that training on the cross entropy loss results in better performance. By focusing on two-layer ReLU networks, which can be fully characterized by measures over their feature space, we explore how the implicit bias induced by gradient-based optimization could partly explain the above phenomenon. We provide theoretical evidence that the regression formulation yields a measure whose support can differ greatly from that for classification, in the case of one-dimensional data. Our proposed optimal supports correspond directly to the features learned by the input layer of the network. The different nature of these supports sheds light on possible optimization difficulties the square loss could encounter during training, and we present empirical results illustrating this phenomenon.
翻译:神经网络可以通过使用梯度法尽量减少平方损失来培训神经网络解决回归问题。然而,实践者往往倾向于将回归重塑为分类问题,认为交叉天体损失培训可以提高性能。通过侧重于两层雷射线网络(其特征空间上的措施可以充分体现其特征空间的特征),我们探索梯度优化引起的隐含偏差如何部分解释上述现象。我们提供了理论证据,证明回归法得出的尺度在单维数据方面,其支持可能与分类方法大不相同。我们建议的最佳支持直接对应于网络输入层所学的特征。这些支持的不同性质揭示了培训过程中可能遇到的方形损失的优化困难,我们提出了说明这一现象的经验结果。