In this work, we leverage a generative data model considering comparison noise to develop a fast, precise, and informative ranking algorithm from pairwise comparisons that produces a measure of confidence on each comparison. The problem of ranking a large number of items from noisy and sparse pairwise comparison data arises in diverse applications, like ranking players in online games, document retrieval or ranking human perceptions. Although different algorithms are available, we need fast, large-scale algorithms whose accuracy degrades gracefully when the number of comparisons is too small. Fitting our proposed model entails solving a non-convex optimization problem, which we tightly approximate by a sum of quasi-convex functions and a regularization term. Resorting to an iterative reweighted minimization and the Primal-Dual Hybrid Gradient method, we obtain PD-Rank, achieving a Kendall tau 0.1 higher than all comparing methods, even for 10\% of wrong comparisons in simulated data matching our data model, and leading in accuracy if data is generated according to the Bradley-Terry model, in both cases faster by one order of magnitude, in seconds. In real data, PD-Rank requires less computational time to achieve the same Kendall tau than active learning methods.
翻译:在这项工作中,我们利用一个考虑到比较噪音的基因化数据模型来利用比较噪音来开发一个快速、精确和内容丰富的等级排序算法,从对每种比较产生某种程度的信任度。对来自吵杂和稀少的对口比较数据的大量物品进行排序的问题出现在多种应用中,例如在线游戏、文件检索或人类认知排名中的排名玩家。虽然有不同的算法,但我们需要快速、大规模算法,其准确性在比较数量过小时会优于优于优于优于微小时的。适应我们提议的模型需要解决一个非convex优化问题,我们通过准convex函数和正规化术语的一和加之和来紧贴近这一问题。重新定位迭代再加权最小化和原始-双重混合梯法,我们获得PD-Rank,实现Kendall tau 0.1比所有比较方法高,即使模拟数据与我们的数据模型相匹配的模拟数据中有10个错误的比较,而且如果数据是根据Bradlead-Tery模型生成的,则导致准确性,在两种情况下,以一个数量级的秒内以一等的等。在实际数据中,PD-Rank的计算方法需要低于Ken的计算方法以相同的时间。Kenaldelex。