Average-case analysis computes the complexity of an algorithm averaged over all possible inputs. Compared to worst-case analysis, it is more representative of the typical behavior of an algorithm, but remains largely unexplored in optimization. One difficulty is that the analysis can depend on the probability distribution of the inputs to the model. However, we show that this is not the case for a class of large-scale problems trained with first-order methods including random least squares and one-hidden layer neural networks with random weights. In fact, the halting time exhibits a universality property: it is independent of the probability distribution. With this barrier for average-case analysis removed, we provide the first explicit average-case convergence rates showing a tighter complexity not captured by traditional worst-case analysis. Finally, numerical simulations suggest this universality property holds for a more general class of algorithms and problems.
翻译:平均情况分析计算了所有可能投入的平均算法的复杂性。 与最坏情况分析相比, 它更能代表算法的典型行为, 但在优化方面基本上仍未探索。 一个困难是, 分析可能取决于对模型投入的概率分布。 然而, 我们表明, 受过第一阶方法培训的大规模问题类别, 包括随机最小方和随机重量的单层神经网络, 情况并非如此。 事实上, 停止时间显示出普遍性属性: 它与概率分布无关。 由于排除了这一对普通情况分析的障碍, 我们提供了第一个明确的平均情况趋同率, 显示传统的最坏情况分析没有抓住的较紧的复杂程度。 最后, 数字模拟表明, 普遍性属性会保留更一般的算法和问题类别。