Tournament procedures, recently introduced in Lugosi & Mendelson (2016), offer an appealing alternative, from a theoretical perspective at least, to the principle of Empirical Risk Minimization in machine learning. Statistical learning by Median-of-Means (MoM) basically consists in segmenting the training data into blocks of equal size and comparing the statistical performance of every pair of candidate decision rules on each data block: that with highest performance on the majority of the blocks is declared as the winner. In the context of nonparametric regression, functions having won all their duels have been shown to outperform empirical risk minimizers w.r.t. the mean squared error under minimal assumptions, while exhibiting robustness properties. It is the purpose of this paper to extend this approach in order to address other learning problems, in particular for which the performance criterion takes the form of an expectation over pairs of observations rather than over one single observation, as may be the case in pairwise ranking, clustering or metric learning. Precisely, it is proved here that the bounds achieved by MoM are essentially conserved when the blocks are built by means of independent sampling without replacement schemes instead of a simple segmentation. These results are next extended to situations where the risk is related to a pairwise loss function and its empirical counterpart is of the form of a $U$-statistic. Beyond theoretical results guaranteeing the performance of the learning/estimation methods proposed, some numerical experiments provide empirical evidence of their relevance in practice.
翻译:最近在Lugosi & Mendelson(2016年) 和 Lugosi & Mendelson(2016年) 推出的巡回赛程序,从理论角度至少从理论角度为在机器学习中最大限度地减少经验风险的原则提供了一个有吸引力的替代方案。Memon-o-means(MOM)的统计学习基本上包括将培训数据分为同等大小的区块,比较每个数据区块的每对候选决定规则的统计性能:大多数区块的最高业绩被宣布为优胜者。在非对称回归方面,所有得逞的功能被显示为在最低假设下超过实验风险最小值的平方误差,同时展示稳健性特性。本文的目的是扩展这一方法,以解决其他学习问题,特别是因为业绩标准的形式是期望对观测对对齐,而不是对一个以上的观察,例如对等等级、组合或计量学习。准确地证明,当通过独立取样手段构建了最低风险最小的比值,而其相对性实验结果则提供了一种与简单分化的相对性计算结果。