We study the problem of decision-making in the setting of a scarcity of shared resources when the preferences of agents are unknown a priori and must be learned from data. Taking the two-sided matching market as a running example, we focus on the decentralized setting, where agents do not share their learned preferences with a central authority. Our approach is based on the representation of preferences in a reproducing kernel Hilbert space, and a learning algorithm for preferences that accounts for uncertainty due to the competition among the agents in the market. Under regularity conditions, we show that our estimator of preferences converges at a minimax optimal rate. Given this result, we derive optimal strategies that maximize agents' expected payoffs and we calibrate the uncertain state by taking opportunity costs into account. We also derive an incentive-compatibility property and show that the outcome from the learned strategies has a stability property. Finally, we prove a fairness property that asserts that there exists no justified envy according to the learned strategies.
翻译:当代理商的偏好是事先未知的,必须从数据中学习时,我们研究在确定共享资源稀缺情况下的决策问题。以双面匹配市场为例,我们注重分散化环境,代理商不与中央当局分享他们学到的偏好。我们的方法是基于在复制的内尔·希尔伯特空间中代表各种偏好,以及一种顾及市场代理商之间竞争造成的不确定性的偏好学习算法。在正常条件下,我们显示我们的偏好估计者会以最适度的速率汇合在一起。鉴于这一结果,我们制定了最佳战略,最大限度地实现代理商预期的回报,我们通过考虑机会成本来调整不确定状态。我们还获得了一种奖励性兼容性财产,并表明学习战略的结果具有稳定性财产。最后,我们证明一种公平的财产,表明没有根据所学的战略存在合理的嫉妒。