Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an issue -- optimizers are largely believed to find approximate global minima. Researchers hypothesize a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima. We rigorously formalize it for concrete instances of machine learning problems.
翻译:非混凝土优化在现代机器学习中是普遍存在的。研究人员设计非混凝土客观功能,并优化它们使用现成的优化功能,如蒸馏梯度梯度下降及其变异,利用当地几何和迭接更新。即使解决非混凝土功能在最坏的情况下是硬的NP,但实际中的优化质量往往不是一个问题,优化者在很大程度上认为找到全球迷你。研究人员假设对这种令人感兴趣的现象有一个统一的解释:实际使用的目标中,大部分当地微型是全球微型。我们严格将其正式化,以具体解决机器学习问题。