Stochastic optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning. Although various algorithms have been extensively studied for AUPRC optimization, the generalization is only guaranteed in the multi-query case. In this work, we present the first trial in the single-query generalization of stochastic AUPRC optimization. For sharper generalization bounds, we focus on algorithm-dependent generalization. There are both algorithmic and theoretical obstacles to our destination. From an algorithmic perspective, we notice that the majority of existing stochastic estimators are biased only when the sampling strategy is biased, and is leave-one-out unstable due to the non-decomposability. To address these issues, we propose a sampling-rate-invariant unbiased stochastic estimator with superior stability. On top of this, the AUPRC optimization is formulated as a composition optimization problem, and a stochastic algorithm is proposed to solve this problem. From a theoretical perspective, standard techniques of the algorithm-dependent generalization analysis cannot be directly applied to such a listwise compositional optimization problem. To fill this gap, we extend the model stability from instancewise losses to listwise losses and bridge the corresponding generalization and stability. Additionally, we construct state transition matrices to describe the recurrence of the stability, and simplify calculations by matrix spectrum. Practically, experimental results on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
翻译:精确回调曲线( AUPRC) 下的“ 区域” 斯托克优化是机器学习的关键问题。 虽然已经为 AUPRC 优化广泛研究了各种算法, 但只有多拼法的优化才保证了一般化。 在这项工作中, 我们提出对随机回调 AUPRC 优化的单一拼法概括化的第一次试验。 对于更清晰的概括化界限, 我们侧重于基于算法的一般化。 我们的目的地既存在算法障碍, 也有理论障碍。 从算法的角度, 我们注意到, 大部分现有的随机测算器只有在取样策略偏差时才有偏差, 并且由于不兼容性而使一般偏差不固定化。 为了解决这些问题, 我们提出一个抽样- 率- 偏差的偏差性随机偏差性随机偏差的测算法, 并且提出一种随机性算法的算法来解决这个问题。 从理论的角度, 标准性一般缩略法的缩略图分析方法, 至精确的缩略性结构的缩略图, 我们无法直接应用这种缩略图的缩略图的缩略图。