The problem of two-sided matching markets has a wide range of real-world applications and has been extensively studied in the literature. A line of recent works have focused on the problem setting where the preferences of one-side market participants are unknown \emph{a priori} and are learned by iteratively interacting with the other side of participants. All these works are based on explore-then-commit (ETC) and upper confidence bound (UCB) algorithms, two common strategies in multi-armed bandits (MAB). Thompson sampling (TS) is another popular approach, which attracts lots of attention due to its easier implementation and better empirical performances. In many problems, even when UCB and ETC-type algorithms have already been analyzed, researchers are still trying to study TS for its benefits. However, the convergence analysis of TS is much more challenging and remains open in many problem settings. In this paper, we provide the first regret analysis for TS in the new setting of iterative matching markets. Extensive experiments demonstrate the practical advantages of the TS-type algorithm over the ETC and UCB-type baselines.
翻译:双面匹配市场的问题具有广泛的现实世界应用,并在文献中进行了广泛研究。最近一行著作侧重于问题背景,其中单面市场参与者的偏好未知 \ emph{ a sisteri},并通过与参与者的另一方的迭代互动学习。所有这些工作都基于探索-当时-承诺(ETC)和上层信任(UCB)算法,这是多臂强盗(MAB)的两种共同策略。汤普森取样(Thompson)是另一种受欢迎的方法,它因其更容易实施和更好的实证表现而吸引了大量关注。在许多问题上,即使UCB和ETC型算法已经分析过,研究人员仍然试图研究TS的好处。然而,对TS的趋同分析却更具挑战性,在许多问题环境中仍然开放。在本文中,我们提供了在迭接市场新环境下对TS进行的第一个遗憾分析。广泛的实验表明TS型算法对于ETC和UCB型基线的实际好处。