Recommendation systems are dynamic economic systems that balance the needs of multiple stakeholders. A recent line of work studies incentives from the content providers' point of view. Content providers, e.g., vloggers and bloggers, contribute fresh content and rely on user engagement to create revenue and finance their operations. In this work, we propose a contextual multi-armed bandit setting to model the dependency of content providers on exposure. In our model, the system receives a user context in every round and has to select one of the arms. Every arm is a content provider who must receive a minimum number of pulls every fixed time period (e.g., a month) to remain viable in later rounds; otherwise, the arm departs and is no longer available. The system aims to maximize the users' (content consumers) welfare. To that end, it should learn which arms are vital and ensure they remain viable by subsidizing arm pulls if needed. We develop algorithms with sub-linear regret, as well as a lower bound that demonstrates that our algorithms are optimal up to logarithmic factors.
翻译:推荐系统是平衡多个利益攸关方需求的动态经济系统。从内容提供者的观点来看,最近的一系列工作研究激励是:内容提供者(如 vloggers 和博客)等内容提供者提供新的内容,依靠用户参与来创造收入和为业务提供资金。在这项工作中,我们建议建立一个背景多武装的土匪环境,以模拟内容提供者对暴露的依赖性。在我们的模式中,系统在每一回合中都有一个用户背景,必须选择其中的一个武器。每个手臂都是内容提供者,必须在每个固定时间段(如一个月)得到最低数量的拉动,才能在以后的回合中保持活力;否则,手臂就离开,不再可用。这个系统旨在尽量扩大用户(消费者)的福利。为此,它应当了解哪些武器至关重要,并确保在必要时通过补贴手臂拉来保持其可行性。我们开发有亚线遗憾的算法,以及一个较低的约束,表明我们的算法最符合逻辑因素。