Large-scale online recommendation systems must facilitate the allocation of a limited number of items among competing users while learning their preferences from user feedback. As a principled way of incorporating market constraints and user incentives in the design, we consider our objectives to be two-fold: maximal social welfare with minimal instability. To maximize social welfare, our proposed framework enhances the quality of recommendations by exploring allocations that optimistically maximize the rewards. To minimize instability, a measure of users' incentives to deviate from recommended allocations, the algorithm prices the items based on a scheme derived from the Walrasian equilibria. Though it is known that these equilibria yield stable prices for markets with known user preferences, our approach accounts for the inherent uncertainty in the preferences and further ensures that the users accept their recommendations under offered prices. To the best of our knowledge, our approach is the first to integrate techniques from combinatorial bandits, optimal resource allocation, and collaborative filtering to obtain an algorithm that achieves sub-linear social welfare regret as well as sub-linear instability. Empirical studies on synthetic and real-world data also demonstrate the efficacy of our strategy compared to approaches that do not fully incorporate all these aspects.
翻译:大规模在线建议系统必须便利在竞争用户之间分配数量有限的项目,同时从用户反馈中学习他们的偏好。作为将市场限制和用户激励因素纳入设计的一个原则性方法,我们认为我们的目标有两个方面:在最低不稳定的情况下实现最大的社会福利;为了最大限度地扩大社会福利,我们提议的框架通过探索乐观地最大限度地扩大收益的分配,提高建议的质量;为了尽量减少不稳定性,一种衡量用户偏离建议分配的激励办法,即基于Walrasian equilibria计划的项目的算法价格。尽管众所周知,这些平衡为已知用户偏好的市场带来稳定价格,但我们的方法说明这些偏好中固有的不确定性,并进一步确保用户接受其建议;根据我们的知识,我们的方法首先是综合组合强盗、最佳资源分配和协作过滤技术,以获得实现次线社会福利遗憾和次线性不稳定的算法。关于合成和现实世界数据的研究也表明我们战略的效力,与不完全纳入所有这些方面的方法相比,我们的战略是有效的。