Pairwise learning is receiving increasing attention since it covers many important machine learning tasks, e.g., metric learning, AUC maximization, and ranking. Investigating the generalization behavior of pairwise learning is thus of significance. However, existing generalization analysis mainly focuses on the convex objective functions, leaving the nonconvex learning far less explored. Moreover, the current learning rates derived for generalization performance of pairwise learning are mostly of slower order. Motivated by these problems, we study the generalization performance of nonconvex pairwise learning and provide improved learning rates. Specifically, we develop different uniform convergence of gradients for pairwise learning under different assumptions, based on which we analyze empirical risk minimizer, gradient descent, and stochastic gradient descent pairwise learning. We first successfully establish learning rates for these algorithms in a general nonconvex setting, where the analysis sheds insights on the trade-off between optimization and generalization and the role of early-stopping. We then investigate the generalization performance of nonconvex learning with a gradient dominance curvature condition. In this setting, we derive faster learning rates of order $\mathcal{O}(1/n)$, where $n$ is the sample size. Provided that the optimal population risk is small, we further improve the learning rates to $\mathcal{O}(1/n^2)$, which, to the best of our knowledge, are the first $\mathcal{O}(1/n^2)$-type of rates for pairwise learning, no matter of convex or nonconvex learning. Overall, we systematically analyzed the generalization performance of nonconvex pairwise learning.
翻译: Pair Wisin 正在受到越来越多的关注, 因为它涵盖了许多重要的机器学习任务, 例如 公制学习 、 AUC 最大化 和 排名 。 因此, 调查双向学习的一般化行为非常重要 。 但是, 现有的一般化分析主要侧重于 convex 目标功能, 使得非 convex 学习的学习程度远不那么深入。 此外, 目前为对口学习的一般化表现所得出的学习率大多是比较慢的。 我们受这些问题的驱动, 我们研究非convex 对口学习的普及性表现, 并提供更好的学习率。 具体地说, 我们根据不同的假设, 为对口学习发展不同的渐渐趋一致的渐渐渐渐趋一致, 在此基础上, 我们系统分析实验风险最小的最小性下降率 。