Potentially, taking advantage of available side information boosts the performance of recommender systems; nevertheless, with the rise of big data, the side information has often several dimensions. Hence, it is imperative to develop decision-making algorithms that can cope with such a high-dimensional context in real-time. That is especially challenging when the decision-maker has a variety of items to recommend. In this paper, we build upon the linear contextual multi-armed bandit framework to address this problem. We develop a decision-making policy for a linear bandit problem with high-dimensional context vectors and several arms. Our policy employs Thompson sampling and feeds it with reduced context vectors, where the dimensionality reduction follows by random projection. Our proposed recommender system follows this policy to learn online the item preferences of users while keeping its runtime as low as possible. We prove a regret bound that scales as a factor of the reduced dimension instead of the original one. For numerical evaluation, we use our algorithm to build a recommender system and apply it to real-world datasets. The theoretical and numerical results demonstrate the effectiveness of our proposed algorithm compared to the state-of-the-art in terms of computational complexity and regret performance.
翻译:利用可获得的侧面信息,有可能提高推荐人系统的性能;然而,随着海量数据的上升,侧面信息往往具有多个维度。因此,必须开发能够实时应对如此高维背景的决策算法。当决策者有各种各样的项目需要建议时,这一点尤其具有挑战性。在本文件中,我们利用线性背景多武装土匪框架来解决这一问题。我们为高维矢量和若干臂的线性土匪问题制定了决策政策。我们的政策采用汤普森抽样,并用更低的上下文矢量为它提供数据。我们提议的推荐人系统遵循这一政策,在网上学习用户对项目的偏好,同时尽可能保持其运行时间的低。我们证明,有遗憾地将尺度作为降低维度的因素而不是原始的系数。在数字评估中,我们使用我们的算法来构建一个推荐人系统,并将其应用于现实世界数据集。理论和数字结果显示我们提议的算法相对于计算复杂度和判断力的状态的效能。