学习非标准环境中的用户首选项 (Learning User Preferences in Non-Stationary Environments)

Recommendation systems often use online collaborative filtering (CF) algorithms to identify items a given user likes over time, based on ratings that this user and a large number of other users have provided in the past. This problem has been studied extensively when users' preferences do not change over time (static case); an assumption that is often violated in practical settings. In this paper, we introduce a novel model for online non-stationary recommendation systems which allows for temporal uncertainties in the users' preferences. For this model, we propose a user-based CF algorithm, and provide a theoretical analysis of its achievable reward. Compared to related non-stationary multi-armed bandit literature, the main fundamental difficulty in our model lies in the fact that variations in the preferences of a certain user may affect the recommendations for other users severely. We also test our algorithm over real-world datasets, showing its effectiveness in real-world applications. One of the main surprising observations in our experiments is the fact our algorithm outperforms other static algorithms even when preferences do not change over time. This hints toward the general conclusion that in practice, dynamic algorithms, such as the one we propose, might be beneficial even in stationary environments.

翻译：建议系统经常使用在线协作过滤算法,根据该用户和许多其他用户过去提供的评级,根据该用户和许多其他用户过去提供的评级,确定特定用户喜欢的项目。当用户的偏好不会随时间变化(静态案例)时,这一问题就得到了广泛的研究;在实际环境中,这种假设经常被违反。在本文中,我们引入了在线非静止建议系统的新模式,允许用户偏好的时间不确定性。对于这一模式,我们提出基于用户的CF算法,并提供对其可实现的奖赏的理论分析。与相关的非静止多臂强盗文献相比,我们模型的主要基本困难在于:某些用户的偏好的变化可能会严重影响对其他用户的建议。我们还在现实世界数据集上测试我们的算法,显示其在现实世界应用中的有效性。我们实验中的主要观察是,我们的算法比其他静态算法要差得多,即使偏好不会随时间变化而改变。这预示着在实践中,动态算法(例如我们提出的一个站环境)甚至可能有益于一个站环境的一般结论。

相关内容