We study the problem of online learning in two-sided non-stationary matching markets, where the objective is to converge to a stable match. In particular, we consider the setting where one side of the market, the arms, has fixed known set of preferences over the other side, the players. While this problem has been studied when the players have fixed but unknown preferences, in this work we study the problem of how to learn when the preferences of the players are time varying. We propose the {\it Restart Competing Bandits (RCB)} algorithm, which combines a simple {\it restart strategy} to handle the non-stationarity with the {\it competing bandits} algorithm \citep{liu2020competing} designed for the stationary case. We show that, with the proposed algorithm, each player receives a uniform sub-linear regret of {$\widetilde{\mathcal{O}}(L^{1/2}_TT^{1/2})$} up to the number of changes in the underlying preference of agents, $L_T$. We also discuss extensions of this algorithm to the case where the number of changes need not be known a priori.
翻译:我们研究在双面非静止匹配市场进行在线学习的问题, 目标是将目标集中到稳定的匹配中。 特别是, 我们考虑市场一方, 武器, 已经固定了已知的偏好, 球员在另一侧, 球员。 虽然这个问题已经研究过, 当球员有固定但未知的偏好时, 我们在这项工作中研究当球员的偏爱时间不同时如何学习的问题。 我们建议使用“ 重新启动竞争强盗(RCB) 算法 ”, 该算法将简单的重新启动策略结合起来, 来处理与为固定案件设计的 强盗 运算法 \ citep{ liu20 competing} 的不常态性。 我们还讨论将这一算法扩展到一个不需要事先知道的变化数量的案例。 我们通过提议的算法显示, 每个球员都会收到 {\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\