Recommendation systems when employed in markets play a dual role: they assist users in selecting their most desired items from a large pool and they help in allocating a limited number of items to the users who desire them the most. Despite the prevalence of capacity constraints on allocations in many real-world recommendation settings, a principled way of incorporating them in the design of these systems has been lacking. Motivated by this, we propose an interactive framework where the system provider can enhance the quality of recommendations to the users by opportunistically exploring allocations that maximize user rewards and respect the capacity constraints using appropriate pricing mechanisms. We model the problem as an instance of a low-rank combinatorial multi-armed bandit problem with selection constraints on the arms. We employ an integrated approach using techniques from collaborative filtering, combinatorial bandits, and optimal resource allocation to provide an algorithm that provably achieves sub-linear regret, namely $\tilde{\mathcal{O}} ( \sqrt{N M (N+M) RT} )$ in $T$ rounds for a problem with $N$ users, $M$ items and rank $R$ mean reward matrix. Empirical studies on synthetic and real-world data also demonstrate the effectiveness and performance of our approach.
翻译:市场使用建议系统时,当市场使用时,建议系统起着双重作用:它们协助用户从大人才库中选择最需要的物品,帮助向最需要的用户分配数量有限的物品。尽管在许多现实世界建议环境中,分配能力普遍受到限制,但缺乏将这些物品纳入这些系统设计的原则性方法。为此,我们提议了一个互动框架,使系统提供者能够通过机会探索分配,提高向用户提出的建议的质量,从而利用适当的定价机制,尽量扩大用户奖励并尊重能力限制。我们把这个问题作为武器选择受限制的低级组合式多武装土匪问题的例子。我们采用综合办法,利用合作过滤技术、组合型强盗和最佳资源分配的最佳方法,提供一种可实现亚线性遗憾的算法,即:$tilde mathcal{O} (\qqqrt{NM (N+M)RT} (美元+M)RT} (美元) ),用于用美元为用户解决问题的回合、$M项目和美元平均奖励方式的等级。