从选择数据中复苏 (Efficient and Accurate Top-$K$ Recovery from Choice Data)

The intersection of learning to rank and choice modeling is an active area of research with applications in e-commerce, information retrieval and the social sciences. In some applications such as recommendation systems, the statistician is primarily interested in recovering the set of the top ranked items from a large pool of items as efficiently as possible using passively collected discrete choice data, i.e., the user picks one item from a set of multiple items. Motivated by this practical consideration, we propose the choice-based Borda count algorithm as a fast and accurate ranking algorithm for top $K$-recovery i.e., correctly identifying all of the top $K$ items. We show that the choice-based Borda count algorithm has optimal sample complexity for top-$K$ recovery under a broad class of random utility models. We prove that in the limit, the choice-based Borda count algorithm produces the same top-$K$ estimate as the commonly used Maximum Likelihood Estimate method but the former's speed and simplicity brings considerable advantages in practice. Experiments on both synthetic and real datasets show that the counting algorithm is competitive with commonly used ranking algorithms in terms of accuracy while being several orders of magnitude faster.

翻译：在建议系统等一些应用中,统计员主要感兴趣的是,利用被动收集的离散选择数据,即用户从一组多个项目中挑选一个项目,尽可能高效地从一个大型项目库中回收一组名次最高的物品,即用户从一组不同选择数据中挑选一个项目。受这一实际考虑的驱动,我们提议将基于选择的博尔达计数算法作为顶级的快速和准确排序算法,即正确识别所有顶级的美元项目。我们显示,基于选择的博尔达计数算算法在广泛的随机实用模型中具有最优化的样本复杂性。我们证明,在限值中,基于选择的博尔达计数算算法产生与通常使用的最大类似估算方法相同的最高价值,但前一种速度和简单性在实践上带来相当大的优势。合成和真实数据集的实验表明,在精确度方面,基于常用的排序算法具有竞争性,同时具有若干级级的精确性。