We design decentralized algorithms for regret minimization in the two-sided matching market with one-sided bandit feedback that significantly improves upon the prior works (Liu et al. 2020a, 2020b, Sankararaman et al. 2020). First, for general markets, for any $\varepsilon > 0$, we design an algorithm that achieves a $O(\log^{1+\varepsilon}(T))$ regret to the agent-optimal stable matching, with unknown time horizon $T$, improving upon the $O(\log^{2}(T))$ regret achieved in (Liu et al. 2020b). Second, we provide the optimal $\Theta(\log(T))$ agent-optimal regret for markets satisfying uniqueness consistency -- markets where leaving participants don't alter the original stable matching. Previously, $\Theta(\log(T))$ regret was achievable (Sankararaman et al. 2020, Liu et al. 2020b) in the much restricted serial dictatorship setting, when all arms have the same preference over the agents. We propose a phase-based algorithm, wherein each phase, besides deleting the globally communicated dominated arms the agents locally delete arms with which they collide often. This local deletion is pivotal in breaking deadlocks arising from rank heterogeneity of agents across arms. We further demonstrate the superiority of our algorithm over existing works through simulations.
翻译:我们设计了一种分散的算法,以便在双面匹配市场中将遗憾降到最低,同时提供单面的土匪反馈,大大改进了先前工程(Liu等人,2020年a、2020年b、Sankararaman等人,2020年b)的成绩。 首先,我们设计了一种分散的算法,在双面匹配市场中将遗憾降到最低,在单面匹配市场中,单面的土匪反馈大大改进(Liu等人,2020年b)。第二,我们为一般市场提供了最佳的美元(Liu等人,2020年a,2020年b)的代理-最佳遗憾,以达到独一美元 > 0美元。首先,我们设计了一种总市场,让参与者无法改变原有的稳定匹配值的市场。前,美元(theta(log)(log)美元)对代理最优化的稳定匹配值(Sankararaman等人,2020年,Li等人,2020年b)感到遗憾,在非常受限制的连续专制的设置中,所有武器都比代理人更喜欢(Liu等人,2020年b)在(Liu and the gro developmental) strate stration) strutislock stratestrate strutes besuders