We study the problem of online learning in competitive settings in the context of two-sided matching markets. In particular, one side of the market, the agents, must learn about their preferences over the other side, the firms, through repeated interaction while competing with other agents for successful matches. We propose a class of decentralized, communication- and coordination-free algorithms that agents can use to reach to their stable match in structured matching markets. In contrast to prior works, the proposed algorithms make decisions based solely on an agent's own history of play and requires no foreknowledge of the firms' preferences. Our algorithms are constructed by splitting up the statistical problem of learning one's preferences, from noisy observations, from the problem of competing for firms. We show that under realistic structural assumptions on the underlying preferences of the agents and firms, the proposed algorithms incur a regret which grows at most logarithmically in the time horizon. Our results show that, in the case of matching markets, competition need not drastically affect the performance of decentralized, communication and coordination free online learning algorithms.
翻译:我们研究了在双面匹配市场背景下竞争环境下在线学习的问题。特别是,市场一方,即代理商,必须通过反复互动,与其它代理商竞争成功匹配,了解其对另一方的偏好。我们建议了一系列分散、沟通和协调的算法,代理商可以在结构化匹配市场中达到稳定匹配。与先前的工程相比,拟议的算法仅根据代理商自身的游戏历史作出决定,不要求预先知道公司的偏好。我们的算法是通过将一个人的偏好、从吵闹的观察、从公司竞争问题中分离出来的统计问题来构建的。我们表明,在对代理商和公司基本偏好的现实结构假设下,拟议的算法在时间范围内最有逻辑性地增长。我们的结果显示,在匹配市场的情况下,竞争不需要对分散、沟通和协调自由在线学习算法的绩效产生极大影响。