We study the problem of online learning in two-sided non-stationary matching markets, where the objective is to converge to a stable match. In particular, we consider the setting where one side of the market, the arms, has fixed known set of preferences over the other side, the players. While this problem has been studied when the players have fixed but unknown preferences, in this work we study the problem of how to learn when the preferences of the players are time varying and unknown. Our contribution is a methodology that can handle any type of preference structure and variation scenario. We show that, with the proposed algorithm, each player receives a uniform sub-linear regret of {$\widetilde{\mathcal{O}}(L^{1/2}_TT^{1/2})$} up to the number of changes in the underlying preferences of the agents, $L_T$. Therefore, we show that the optimal rates for single-agent learning can be achieved in spite of the competition up to a difference of a constant factor. We also discuss extensions of this algorithm to the case where the number of changes need not be known a priori.
翻译:我们研究了在双面非静止匹配市场进行在线学习的问题,目标是将目标集中到稳定的匹配中。我们特别考虑了市场一方,即武器,已经固定了对另一方的已知偏好。虽然这一问题在球员固定但未知的偏好时已经研究过,但在这项工作中,我们研究的是当球员的偏爱时间不同和未知时如何学习的问题。我们的贡献是一种能够处理任何类型的优惠结构和变异情景的方法。我们通过提议的算法,我们发现每个球员都得到了一个统一的子线性分线性遗憾{${全线性}{O{{(L ⁇ 1/2 ⁇ T}}}},最多不超过代理人基本偏好次数的变化次数,即$L_T$。因此,我们表明,尽管竞争达到一个不变的因素的差异,但单一试剂学习的最佳率是可以实现的。我们还讨论了将这一算法扩大到一个不需要事先知道多少变化的案例。