We present multi-point optimization: an optimization technique that allows to train several models simultaneously without the need to keep the parameters of each one individually. The proposed method is used for a thorough empirical analysis of the loss landscape of neural networks. By extensive experiments on FashionMNIST and CIFAR10 datasets we demonstrate two things: 1) loss surface is surprisingly diverse and intricate in terms of landscape patterns it contains, and 2) adding batch normalization makes it more smooth. Source code to reproduce all the reported results is available on GitHub: https://github.com/universome/loss-patterns.
翻译:我们提出了一种多点优化方法:该优化技术能够同时训练多个模型,而无需单独保存每个模型的参数。所提出的方法被用于对神经网络损失景观进行全面的实证分析。通过在FashionMNIST和CIFAR10数据集上进行大量实验,我们证明了以下两点:1)损失曲面在包含的景观模式方面具有惊人的多样性和复杂性;2)添加批量归一化会使其更加平滑。重现所有报告结果的源代码可在GitHub上获取:https://github.com/universome/loss-patterns。