排序问题的统计与计算挑战 (Statistical and computational challenges in ranking)

We consider the problem of ranking $n$ experts according to their abilities, based on the correctness of their answers to $d$ questions. This is modeled by the so-called crowd-sourcing model, where the answer of expert $i$ on question $k$ is modeled by a random entry, parametrized by $M_{i,k}$ which is increasing linearly with the expected quality of the answer. To enable the unambiguous ranking of the experts by ability, several assumptions on $M$ are available in the literature. We consider here the general isotonic crowd-sourcing model, where $M$ is assumed to be isotonic up to an unknown permutation $π^*$ of the experts - namely, $M_{π^{*-1}(i),k} \geq M_{π^{*-1}(i+1),k}$ for any $i\in [n-1], k \in [d]$. Then, ranking experts amounts to constructing an estimator of $π^*$. In particular, we investigate here the existence of statistically optimal and computationally efficient procedures and we describe recent results that disprove the existence of computational-statistical gaps for this problem. To provide insights on the key ideas, we start by discussing simpler and yet related sub-problems, namely sub-matrix detection and estimation. This corresponds to specific instances of the ranking problem where the matrix $M$ is constrained to be of the form $λ\mathbf 1\{S\times T\}$ where $S\subset [n], T\subset [d]$. This model has been extensively studied. We provide an overview of the results and proof techniques for this problem with a particular emphasis on the computational lower bounds based on low-degree polynomial methods. Then, we build upon this instrumental sub-problem to discuss existing results and algorithmic ideas for the general ranking problem.

翻译：我们考虑根据专家对$d$个问题回答的正确性，对其能力进行排序的问题。该问题通过众包模型进行建模，其中专家$i$对问题$k$的回答由随机项表示，其参数$M_{i,k}$随答案的期望质量线性增加。为使专家能力排序具有明确性，文献中提出了对$M$的若干假设。本文研究一般保序众包模型，其中假设$M$在未知专家排列$π^*$下具有保序性——即对任意$i\in [n-1], k \in [d]$满足$M_{π^{*-1}(i),k} \geq M_{π^{*-1}(i+1),k}$。此时，专家排序问题等价于构建$π^*$的估计量。我们特别探讨了统计最优且计算高效方法的存在性，并介绍了近期证明该问题不存在计算-统计间隙的研究成果。为阐明核心思想，我们首先讨论更简单且相关的子问题——子矩阵检测与估计。这对应于排序问题的特定实例，其中矩阵$M$被约束为$λ\mathbf 1\{S\times T\}$形式（$S\subset [n], T\subset [d]$）。该模型已得到广泛研究。我们系统综述了该问题的研究成果与证明技术，特别聚焦于基于低阶多项式方法的计算下界分析。进而，我们以此关键子问题为基础，探讨一般排序问题的现有成果与算法思想。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【ICML2024】基于正则化的持续学习的统计理论

专知会员服务

21+阅读 · 2024年6月11日

【NeurIPS2021】存在潜在变量和选择偏差的递归因果结构学习

专知会员服务

22+阅读 · 2021年11月15日

【NeurIPS2021】序一致因果图的多任务学习

专知会员服务

20+阅读 · 2021年11月7日

【NeurIPS2020】无限可能的联合对比学习

专知会员服务

29+阅读 · 2020年10月2日