实现理论激励的神经初始化优化 (Towards Theoretically Inspired Neural Initialization Optimization)

Automated machine learning has been widely explored to reduce human efforts in designing neural architectures and looking for proper hyperparameters. In the domain of neural initialization, however, similar automated techniques have rarely been studied. Most existing initialization methods are handcrafted and highly dependent on specific architectures. In this paper, we propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network. Specifically, GradCosine is the cosine similarity of sample-wise gradients with respect to the initialized parameters. By analyzing the sample-wise optimization landscape, we show that both the training and test performance of a network can be improved by maximizing GradCosine under gradient norm constraint. Based on this observation, we further propose the neural initialization optimization (NIO) algorithm. Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost compared with the training time. With NIO, we improve the classification performance of a variety of neural architectures on CIFAR-10, CIFAR-100, and ImageNet. Moreover, we find that our method can even help to train large vision Transformer architecture without warmup.

翻译：为了减少人类在设计神经结构和寻找适当的超光度计方面的努力,广泛探索了自动化机器学习,以减少人类在设计神经结构和寻找适当的超光度计方面的努力。然而,在神经初始化领域,类似的自动化技术很少研究。现有的初始化方法大多是手工制作的,高度依赖特定结构。在本文中,我们提出了不同的数量,名为GradCosine, 其理论见解可用于评估神经网络的初始状态。具体地说,GradCosine是初步参数样本和样本梯度的相似性。通过分析样本和优化景观,我们表明,在梯度规范限制下最大限度地利用格拉德-科西纳,可以改进网络的培训和测试性能。根据这一观察,我们进一步提出了神经初始化优化的算法。从抽样分析到真实的批量设置,国家工业组织能够自动地寻找更好的初始化,其成本与培训时间相比微不足道。通过国家工业组织,我们改进了网络的各种神经结构的分类工作表现。我们找到了在CIFAR-10、CIFAR-100、CFAR-100和图像网络上没有大规模转换结构的帮助。此外,我们可以找到一种方法。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日