The underlying loss landscapes of deep neural networks have a great impact on their training, but they have mainly been studied theoretically due to computational constraints. This work vastly reduces the time required to compute such loss landscapes, and uses them to study winning lottery tickets found via iterative magnitude pruning. We also share results that contradict previously claimed correlations between certain loss landscape projection methods and model trainability and generalization error.
翻译:深层神经网络的潜在损失场景对其培训有着重大影响,但主要是由于计算限制而在理论上进行了研究。 这项工作极大地缩短了计算这种损失场景所需的时间,并用它们来研究通过迭代规模裁剪发现的中彩票。 我们还分享了与先前声称的某些损失场景预测方法与模型培训性和概括错误之间的相互关系相矛盾的结果。