Recent work on mode connectivity in the loss landscape of deep neural networks has demonstrated that the locus of (sub-)optimal weight vectors lies on continuous paths. In this work, we train a neural network that serves as a hypernetwork, mapping a latent vector into high-performance (low-loss) weight vectors, generalizing recent findings of mode connectivity to higher dimensional manifolds. We formulate the training objective as a compromise between accuracy and diversity, where the diversity takes into account trivial symmetry transformations of the target network. We demonstrate how to reduce the number of parameters in the hypernetwork by parameter sharing. Once learned, the hypernetwork allows for a computationally efficient, ancestral sampling of neural network weights, which we recruit to form large ensembles. The improvement in classification accuracy obtained by this ensembling indicates that the generated manifold extends in dimensions other than directions implied by trivial symmetries. For computational efficiency, we distill an ensemble into a single classifier while retaining generalization.
翻译:在深神经网络损失地貌中,最近关于模式连通性的工作表明,(次)最佳重量矢量的枢纽位于连续路径上。在这项工作中,我们训练一个作为超网络的神经网络,将潜载矢量映射为高性能(低损)重量矢量,将最近关于模式连通性的最新发现概括到高维多元体。我们把培训目标作为准确性和多样性之间的折中,其中多样性考虑到目标网络的微小对称变化。我们通过共享参数来显示如何减少超高网络中的参数数量。一经了解,超网络允许对神经网络重量进行计算高效的、祖传式取样,我们招募这些神经网络重量组成大型组合。通过这一组合获得的分类准确性改进表明,生成的元量在维度上延伸,而非微小的对称所暗示的方向。关于计算效率,我们将一个混合体提炼成一个单一的分类,同时保留一般化。