Amid mounting concern about the reliability and credibility of machine learning research, we present a principled framework for making robust and generalizable claims: the Multiverse Analysis. Our framework builds upon the Multiverse Analysis (Steegen et al., 2016) introduced in response to psychology's own reproducibility crisis. To efficiently explore high-dimensional and often continuous ML search spaces, we model the multiverse with a Gaussian Process surrogate and apply Bayesian experimental design. Our framework is designed to facilitate drawing robust scientific conclusions about model performance, and thus our approach focuses on exploration rather than conventional optimization. In the first of two case studies, we investigate disputed claims about the relative merit of adaptive optimizers. Second, we synthesize conflicting research on the effect of learning rate on the large batch training generalization gap. For the machine learning community, the Multiverse Analysis is a simple and effective technique for identifying robust claims, for increasing transparency, and a step toward improved reproducibility.
翻译:在对机器学习研究的可靠性和可信度日益关切的情况下,我们提出了一个原则框架,用于提出可靠和可概括的主张:多元分析。我们的框架以针对心理学本身的可复制危机而引入的多元分析(Steegen等人,2016年)为基础。为了高效地探索高维和往往是连续的ML搜索空间,我们用高山进程替代模型来模拟多元研究,并应用巴耶斯实验设计。我们的框架旨在便利就模型性能得出稳健的科学结论,因此我们的方法侧重于探索而不是常规优化。在两个案例研究中,我们首先调查关于适应性优化的相对优点的争议性主张。第二,我们综合了关于学习率对大型批次培训一般化差距的影响的相互矛盾的研究。对于机器学习界来说,多元分析是一种简单而有效的技术,用以确定稳健的主张,增加透明度,并朝着改进再生能力的方向迈出一步。