Online platforms in the Internet Economy commonly incorporate recommender systems that recommend arms (e.g., products) to agents (e.g., users). In such platforms, a myopic agent has a natural incentive to exploit, by choosing the best product given the current information rather than to explore various alternatives to collect information that will be used for other agents. We propose a novel recommender system that respects agents' incentives and enjoys asymptotically optimal performances expressed by the regret in repeated games. We model such an incentive-aware recommender system as a multi-agent bandit problem in a two-sided market which is equipped with an incentive constraint induced by agents' opportunity costs. If the opportunity costs are known to the principal, we show that there exists an incentive-compatible recommendation policy, which pools recommendations across a genuinely good arm and an unknown arm via a randomized and adaptive approach. On the other hand, if the opportunity costs are unknown to the principal, we propose a policy that randomly pools recommendations across all arms and uses each arm's cumulative loss as feedback for exploration. We show that both policies also satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
翻译:互联网经济的在线平台通常包含向代理商(例如用户)推荐武器的建议系统(例如产品)。在这种平台中,一个近视剂自然具有利用的动力,根据当前信息选择最佳产品,而不是探索收集其他代理商将使用的信息的各种替代办法。我们提议了一个新颖的建议系统,尊重代理商的激励机制,并享受反复游戏中遗憾表达的零星最佳表现。我们把这种奖励意识推荐系统作为双面市场上的多剂强盗问题来模拟,该市场装备有代理人机会成本引起的刺激限制。如果机会成本为校长所知,我们表明存在着一种激励兼容的建议政策,通过随机和适应的方法,将建议汇集在真正好的手臂和未知的手臂之间。另一方面,如果机会成本为本方所不知,我们提出一种政策,将所有军火的建议随机汇集在一起,并将每条手臂的累积损失作为勘探的反馈。我们表明,两种政策也都符合保护代理商免遭过度开发的事后公平标准。