Generalization performance of stochastic optimization stands a central place in learning theory. In this paper, we investigate the excess risk performance and towards improved learning rates for two popular approaches of stochastic optimization: empirical risk minimization (ERM) and stochastic gradient descent (SGD). Although there exists plentiful generalization analysis of ERM and SGD for supervised learning, current theoretical understandings of ERM and SGD either have stronger assumptions in convex learning, e.g., strong convexity, or show slow rates and less studied in nonconvex learning. Motivated by these problems, we aim to provide improved rates under milder assumptions in convex learning and derive faster rates in nonconvex learning. It is notable that our analysis span two popular theoretical viewpoints: \emph{stability} and \emph{uniform convergence}. Specifically, in stability regime, we present high probability learning rates of order $\mathcal{O} (1/n)$ w.r.t. the sample size $n$ for ERM and SGD with milder assumptions in convex learning and similar high probability rates of order $\mathcal{O} (1/n)$ in nonconvex learning, rather than in expectation. Furthermore, this type of learning rate is improved to faster order $\mathcal{O} (1/n^2)$ in uniform convergence regime. To our best knowledge, for ERM and SGD, the learning rates presented in this paper are all state-of-the-art.
翻译:在本文件中,我们调查了超额风险绩效,并改进了两种流行的随机优化方法的学习率:实验风险最小化(ERM)和随机梯度下降(SGD)。尽管对机构风险管理和SGD进行了广泛的一般分析,以监督学习,但目前对机构风险管理和SGD的理论理解在 convex 学习中具有较强的假设,例如,强凝固度,或显示速度缓慢,在非 convex 学习中研究较少。受这些问题的驱动,我们的目标是在比较温和的假设下提供更好的学习率,在非convex 学习中提高速度。值得注意的是,我们对机构风险管理和SGD的理论分析涉及两种受欢迎的理论观点:\emph{sable}和\emph{unform 趋同}。具体地说,在稳定性制度中,我们对顺序的排序为$mathcal{O}(1/n)w.r.t.state 美元和SGD(C)加温度假设在 convex 学习中,这种平均学习率为1x 和类似概率排序。