Generalization performance of stochastic optimization stands a central place in machine learning. In this paper, we investigate the excess risk performance and towards improved learning rates for two popular approaches of stochastic optimization: empirical risk minimization (ERM) and stochastic gradient descent (SGD). Although there exists plentiful generalization analysis of ERM and SGD for supervised learning, current theoretical understandings of ERM and SGD are either have stronger assumptions in convex learning, e.g., strong convexity condition, or show slow rates and less studied in nonconvex learning. Motivated by these problems, we aim to provide improved rates under milder assumptions in convex learning and derive faster rates in nonconvex learning. It is notable that our analysis span two popular theoretical viewpoints: stability and uniform convergence. To be specific, in stability regime, we present high probability rates of order $\mathcal{O} (1/n)$ w.r.t. the sample size $n$ for ERM and SGD with milder assumptions in convex learning and similar high probability rates of order $\mathcal{O} (1/n)$ in nonconvex learning, rather than in expectation. Furthermore, this type of learning rate is improved to faster order $\mathcal{O} (1/n^2)$ in uniform convergence regime. To the best of our knowledge, for ERM and SGD, the learning rates presented in this paper are all state-of-the-art.
翻译:在本文件中,我们调查了超额风险业绩,并改进了两种流行的随机优化方法的学习率:实验风险最小化(ERM)和随机梯度下降(SGD)。虽然对机构风险管理和SGD进行了广泛的一般分析,以监督学习,但目前对ERM和SGD的理论理解在Convex学习中或强的假设中,例如,强性粘结状态,或显示低率,在非康韦克斯学习中研究较少。受这些问题的驱动,我们的目标是在康韦思学习中提供较温和假设下的提高率,并在非康韦思学习中提高速度。值得注意的是,我们的分析涉及两种流行的理论观点:稳定性和统一。具体地说,在稳定制度中,对ERM和SGD的理论理解要么高概率(1/n) w.r.t.t. 。 用于企业风险管理和SGD的样本规模为美元,在康韦克斯学习中采用较温和的假设,以及类似高的概率率,在S-max 学习中,S-max 学习中这种最接近的学习率(1.xxx) (1.x) 学习速度的概率率为1x) 。