超参数损失曲面在其最优解附近具有简单结构 (Hyperparameter Loss Surfaces Are Simple Near their Optima)

Hyperparameters greatly impact models' capabilities; however, modern models are too large for extensive search. Instead, researchers design recipes that train well across scales based on their understanding of the hyperparameters. Despite this importance, few tools exist for understanding the hyperparameter loss surface. We discover novel structure in it and propose a new theory yielding such tools. The loss surface is complex, but as you approach the optimum simple structure emerges. It becomes characterized by a few basic features, like its effective dimension and the best possible loss. To uncover this asymptotic regime, we develop a novel technique based on random search. Within this regime, the best scores from random search take on a new distribution we discover. Its parameters are exactly the features defining the loss surface in the asymptotic regime. From these features, we derive a new asymptotic law for random search that can explain and extrapolate its convergence. These new tools enable new analyses, such as confidence intervals for the best possible performance or determining the effective number of hyperparameters. We make these tools available at https://github.com/nicholaslourie/opda .

翻译：超参数对模型能力具有显著影响；然而，现代模型规模过大，难以进行广泛搜索。因此，研究者基于对超参数的理解，设计了可在不同规模下实现良好训练效果的方案。尽管超参数至关重要，目前却鲜有工具可用于理解超参数损失曲面。本文发现了该曲面的新颖结构，并提出可生成此类工具的新理论。损失曲面本身具有复杂性，但随着趋近最优解，简单结构逐渐显现——其特征可由若干基本属性描述，例如有效维度和可能达到的最佳损失值。为揭示这种渐近状态，我们开发了一种基于随机搜索的新技术。在此状态下，随机搜索得到的最佳分数服从我们发现的新分布，其参数恰好定义了渐近状态下损失曲面的特征属性。基于这些特征，我们推导出适用于随机搜索的新渐近定律，该定律能够解释并外推其收敛行为。这些新工具支持多种新型分析，例如对最佳可能性能的置信区间估计或有效超参数数量的判定。相关工具已在 https://github.com/nicholaslourie/opda 开源发布。