Mixture models of Plackett-Luce (PL) -- one of the most fundamental ranking models -- are an active research area of both theoretical and practical significance. Most previously proposed parameter estimation algorithms instantiate the EM algorithm, often with random initialization. However, such an initialization scheme may not yield a good initial estimate and the algorithms require multiple restarts, incurring a large time complexity. As for the EM procedure, while the E-step can be performed efficiently, maximizing the log-likelihood in the M-step is difficult due to the combinatorial nature of the PL likelihood function (Gormley and Murphy 2008). Therefore, previous authors favor algorithms that maximize surrogate likelihood functions (Zhao et al. 2018, 2020). However, the final estimate may deviate from the true maximum likelihood estimate as a consequence. In this paper, we address these known limitations. We propose an initialization algorithm that can provide a provably accurate initial estimate and an EM algorithm that maximizes the true log-likelihood function efficiently. Experiments on both synthetic and real datasets show that our algorithm is competitive in terms of accuracy and speed to baseline algorithms, especially on datasets with a large number of items.
翻译:Plackett-Luce(PL)的混合模型(PL) -- -- 最基本的排序模型之一 -- -- 是一个具有理论和实际意义的积极研究领域。大多数先前提议的参数估算算法都倾向于快速实现EM算法,经常是随机初始化。然而,这种初始化方案可能不会产生良好的初始估计,而算法则需要多次重开,从而产生很大的时间复杂性。对于EM程序,电子步骤可以有效运行,但最大限度地实现M步骤的日志相似性由于PL概率函数的组合性质(Gormley和Murphy2008)而困难重重。因此,以往的作者偏爱尽可能扩大代理概率功能的参数估算法(Zhao等人,2020年)。然而,最终估算可能因此偏离了真实最大的可能性估算。在本文件中,我们探讨了这些已知的局限性。我们建议了一种初始化算法,它能够提供可被确认准确的初始估计值和EM算法,从而高效地最大限度地实现真实的日志相似性功能。在合成和真实的数据集上进行的实验表明,我们的算法在高数据的精确性和速度上都具有竞争力。