部分回收最高至k美元排名:最低就业指标的最佳程度和可视方法的亚最佳程度 (Partial Recovery for Top-$k$ Ranking: Optimality of MLE and Sub-Optimality of Spectral Method)

Given partially observed pairwise comparison data generated by the Bradley-Terry-Luce (BTL) model, we study the problem of top-$k$ ranking. That is, to optimally identify the set of top-$k$ players. We derive the minimax rate with respect to a normalized Hamming loss. This provides the first result in the literature that characterizes the partial recovery error in terms of the proportion of mistakes for top-$k$ ranking. We also derive the optimal signal to noise ratio condition for the exact recovery of the top-$k$ set. The maximum likelihood estimator (MLE) is shown to achieve both optimal partial recovery and optimal exact recovery. On the other hand, we show another popular algorithm, the spectral method, is in general sub-optimal. Our results complement the recent work by Chen et al. (2019) that shows both the MLE and the spectral method achieve the optimal sample complexity for exact recovery. It turns out the leading constants of the sample complexity are different for the two algorithms. Another contribution that may be of independent interest is the analysis of the MLE without any penalty or regularization for the BTL model. This closes an important gap between theory and practice in the literature of ranking.

翻译：根据部分观测到的Bradley-Terriy-Luce(BTL)模型生成的对称比较数据,我们研究了最高-美元排名的问题。也就是说,最佳地确定最高-美元玩家的一组美元牌。我们得出了正常的Hamming损失的迷你算法率。这提供了文献中的第一个结果,这些文献在最高-美元排名错误比例方面对部分回收错误进行了描述。我们还为准确回收最高-美元套件得出了对噪声比率条件的最佳信号。显示最高-美元分数的最大可能性估计值(MLE)是为了实现最佳的部分回收和最佳的精确回收。另一方面,我们展示了另一种流行算法,即光谱法,是一般的次最佳方法。我们的结果补充了陈等人最近的工作(2019年),其中显示最低生活成本和光谱方法都达到了准确回收的最佳样本复杂性。结果显示,抽样复杂性的主要常数对两种算法是不同的。另一个独立的兴趣贡献可能是对MLE的模型进行差别分析,而没有对BT的这种重要的理论和定型之间有差距。

相关内容

极大似然估计

关注 5

极大似然估计方法（Maximum Likelihood Estimate，MLE）也称为最大概似估计或最大似然估计，是求估计的另一种方法，最大概似是1821年首先由德国数学家高斯（C. F. Gauss）提出，但是这个方法通常被归功于英国的统计学家罗纳德·费希尔（R. A. Fisher）它是建立在极大似然原理的基础上的一个统计方法，极大似然原理的直观想法是，一个随机试验如有若干个可能的结果A，B，C，... ，若在一次试验中，结果A出现了，那么可以认为实验条件对A的出现有利，也即出现的概率P(A)较大。极大似然原理的直观想法我们用下面例子说明。设甲箱中有99个白球，1个黑球；乙箱中有1个白球．99个黑球。现随机取出一箱，再从抽取的一箱中随机取出一球，结果是黑球，这一黑球从乙箱抽取的概率比从甲箱抽取的概率大得多，这时我们自然更多地相信这个黑球是取自乙箱的。一般说来，事件A发生的概率与某一未知参数theta有关， theta取值不同，则事件A发生的概率P(A/theta)也不同，当我们在一次试验中事件A发生了，则认为此时的theta值应是t的一切可能取值中使P(A/theta)达到最大的那一个，极大似然估计法就是要选取这样的t值作为参数t的估计值，使所选取的样本在被选的总体中出现的可能性为最大。