通过人口密集的山谷降级——衡量深学习最佳因素的基准 (Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers)

Choosing the optimizer is considered to be among the most crucial design decisions in deep learning, and it is not an easy one. The growing literature now lists hundreds of optimization methods. In the absence of clear theoretical guidance and conclusive empirical evidence, the decision is often made based on anecdotes. In this work, we aim to replace these anecdotes, if not with a conclusive ranking, then at least with evidence-backed heuristics. To do so, we perform an extensive, standardized benchmark of fifteen particularly popular deep learning optimizers while giving a concise overview of the wide range of possible choices. Analyzing more than $50,000$ individual runs, we contribute the following three points: (i) Optimizer performance varies greatly across tasks. (ii) We observe that evaluating multiple optimizers with default parameters works approximately as well as tuning the hyperparameters of a single, fixed optimizer. (iii) While we cannot discern an optimization method clearly dominating across all tested tasks, we identify a significantly reduced subset of specific optimizers and parameter choices that generally lead to competitive results in our experiments: Adam remains a strong contender, with newer methods failing to significantly and consistently outperform it. Our open-sourced results are available as challenging and well-tuned baselines for more meaningful evaluations of novel optimization methods without requiring any further computational efforts.

翻译：在深层学习中,选择优化被认为是最关键的设计决定之一,这不是一件容易的事。越来越多的文献现在列出了数百种优化方法。在缺乏明确的理论指导和结论性经验证据的情况下,决定往往以厌食物为基础。在这项工作中,我们的目标是替换这些厌食物,如果不是最后排序的话,至少是用有证据支持的湿度。要这样做,我们执行广泛、标准化的15个特别受欢迎的深层学习优化标准,同时简要概述各种可能的选择。分析超过50 000美元的个人运行,我们贡献了以下三点:(一) 优化剂的绩效在各项任务中差异很大。 (二) 我们观察到,对多个有默认参数的优化剂的评价大约起作用,并且调整单一固定优化剂的超分数。 (三) 虽然我们无法发现一种最优化方法在所有经过测试的任务中明显占据优势,但我们发现一个大大减少的具体优化剂和参数选择组,通常导致我们实验中的竞争性结果。 (一) 优化剂仍然是强有力的竞争对手,而且我们的任何新式的升级方法都无法持续得到。