In the real world, people/entities usually find matches independently and autonomously, such as finding jobs, partners, roommates, etc. It is possible that this search for matches starts with no initial knowledge of the environment. We propose the use of a multi-agent reinforcement learning (MARL) paradigm for a spatially formulated decentralized two-sided matching market with independent and autonomous agents. Having autonomous agents acting independently makes our environment very dynamic and uncertain. Moreover, agents lack the knowledge of preferences of other agents and have to explore the environment and interact with other agents to discover their own preferences through noisy rewards. We think such a setting better approximates the real world and we study the usefulness of our MARL approach for it. Along with conventional stable matching case where agents have strictly ordered preferences, we check the applicability of our approach for stable matching with incomplete lists and ties. We investigate our results for stability, level of instability (for unstable results), and fairness. Our MARL approach mostly yields stable and fair outcomes.
翻译:在现实世界中,人们/实体通常能够独立自主地找到匹配,如寻找工作、伙伴、室友等。这种寻找匹配可能首先没有对环境的初步了解。我们建议使用多剂强化学习模式,在空间上制定分权化的双向匹配市场,与独立自主的代理商建立双向匹配市场。自主代理商独立行动,使我们的环境非常活跃和不确定。此外,代理商缺乏其他代理商的偏好知识,必须探索环境,并与其他代理商互动,以通过吵闹的奖励发现自己的偏好。我们认为这样的环境比真实世界更接近,我们研究我们的MARL方法对环境的有用性。除了传统的稳定匹配案例,即代理商有严格的订单偏好,我们还要检查我们的方法是否适合与不完整的名单和联系进行稳定的匹配。我们为了稳定、不稳定程度(因为结果不稳定)和公平性而调查我们的结果。我们的MARL方法大多产生稳定和公正的结果。