Several papers argue that wide minima generalize better than narrow minima. In this paper, through detailed experiments that not only corroborate the generalization properties of wide minima, we also provide empirical evidence for a new hypothesis that the density of wide minima is likely lower than the density of narrow minima. Further, motivated by this hypothesis, we design a novel explore-exploit learning rate schedule. On a variety of image and natural language datasets, compared to their original hand-tuned learning rate baselines, we show that our explore-exploit schedule can result in either up to 0.84% higher absolute accuracy using the original training budget or up to 57% reduced training time while achieving the original reported accuracy. For example, we achieve state-of-the-art (SOTA) accuracy for IWSLT'14 (DE-EN) dataset by just modifying the learning rate schedule of a high performing model.
翻译:有几篇论文认为,广义的微型研究比狭小的微型研究要简单得多。 在本文中,通过不仅证实大微型研究一般特性的详细实验,我们还为一个新的假设提供了经验证据,即大微型研究的密度可能低于狭小微型研究的密度。此外,基于这一假设,我们设计了一个创新的探索-开发学习率时间表。与最初的手工调整学习率基线相比,我们的各种图像和自然语言数据集表明,我们的探索-开发时间表可以导致使用原始培训预算达到0.84%的绝对精确度,或者在达到最初报告的准确度的同时减少57%的培训时间。例如,我们通过仅仅修改高性能模型的学习率时间表,就实现了IWSLT'14(DE-EN)数据的最先进(SOTA)精确度。