Large-scale, two-sided matching platforms must find market outcomes that align with user preferences while simultaneously learning these preferences from data. Classical notions of stability (Gale and Shapley, 1962; Shapley and Shubik, 1971) are unfortunately of limited value in the learning setting, given that preferences are inherently uncertain and destabilizing while they are being learned. To bridge this gap, we develop a framework and algorithms for learning stable market outcomes under uncertainty. Our primary setting is matching with transferable utilities, where the platform both matches agents and sets monetary transfers between them. We design an incentive-aware learning objective that captures the distance of a market outcome from equilibrium. Using this objective, we analyze the complexity of learning as a function of preference structure, casting learning as a stochastic multi-armed bandit problem. Algorithmically, we show that "optimism in the face of uncertainty," the principle underlying many bandit algorithms, applies to a primal-dual formulation of matching with transfers and leads to near-optimal regret bounds. Our work takes a first step toward elucidating when and how stable matchings arise in large, data-driven marketplaces.
翻译:大型、 双面匹配平台必须找到与用户偏好相一致的市场结果, 同时从数据中学习这些偏好。 传统的稳定性概念( Gale 和 Shapley, 1962年; Shapley 和 Shubik, 1971年) 不幸地在学习环境中价值有限, 因为在学习过程中偏好本身具有不确定性和不稳定性。 为了缩小这一差距, 我们开发了一个框架和算法, 在不确定性下学习稳定的市场结果。 我们的主要环境是匹配可转让公用事业, 平台既匹配代理商, 也设定它们之间的货币转移。 我们设计了一个有激励意识的学习目标, 捕捉到市场结果与均衡的距离。 我们利用这个目标, 分析学习的复杂性作为偏好结构的函数, 将学习作为一个随机的多武装的匪帮问题。 说, 我们显示, “ 面对不确定性时的乐观性, ” 许多土匪算法背后的原则, 适用于原始的配对转移的配方, 并导致接近最优化的遗憾束缚。 我们的工作迈出了第一步, 在大型、 驱动的市场出现稳定匹配的时候, 。