In this paper, we propose a new algorithm for addressing the problem of matching markets with complementary preferences, where agents' preferences are unknown a priori and must be learned from data. The presence of complementary preferences can lead to instability in the matching process, making this problem challenging to solve. To overcome this challenge, we formulate the problem as a bandit learning framework and propose the Multi-agent Multi-type Thompson Sampling (MMTS) algorithm. The algorithm combines the strengths of Thompson Sampling for exploration with a double matching technique to achieve a stable matching outcome. Our theoretical analysis demonstrates the effectiveness of MMTS as it is able to achieve stability at every matching step, satisfies the incentive-compatibility property, and has a sublinear Bayesian regret over time. Our approach provides a useful method for addressing complementary preferences in real-world scenarios.
翻译:在本文中,我们提出一种新的算法,以解决市场与补充性优惠相匹配的问题,即代理人的偏好是先验的,必须从数据中学习。补充性优惠的存在可能导致匹配过程中的不稳定,使这一问题难以解决。为了克服这一挑战,我们将问题发展成一个强盗学习框架,并提出多试剂多型汤普森抽样算法。该算法将Thompson抽样勘探的优势与实现稳定匹配结果的双重匹配技术结合起来。我们的理论分析表明MMTTS在每一个匹配步骤都能够实现稳定、满足激励兼容性特性并存并随着时间的推移产生亚线性Bayesian遗憾方面的有效性。我们的方法为现实世界中解决互补性优惠提供了有用的方法。