Multi-armed bandits (MAB) provide a principled online learning approach to attain the balance between exploration and exploitation. Due to the superior performance and low feedback learning without the learning to act in multiple situations, Multi-armed Bandits drawing widespread attention in applications ranging such as recommender systems. Likewise, within the recommender system, collaborative filtering (CF) is arguably the earliest and most influential method in the recommender system. Crucially, new users and an ever-changing pool of recommended items are the challenges that recommender systems need to address. For collaborative filtering, the classical method is training the model offline, then perform the online testing, but this approach can no longer handle the dynamic changes in user preferences which is the so-called cold start. So how to effectively recommend items to users in the absence of effective information? To address the aforementioned problems, a multi-armed bandit based collaborative filtering recommender system has been proposed, named BanditMF. BanditMF is designed to address two challenges in the multi-armed bandits algorithm and collaborative filtering: (1) how to solve the cold start problem for collaborative filtering under the condition of scarcity of valid information, (2) how to solve the sub-optimal problem of bandit algorithms in strong social relations domains caused by independently estimating unknown parameters associated with each user and ignoring correlations between users.
翻译:多武装匪徒(MAB)提供了一种原则性在线学习方法,以实现勘探与开发之间的平衡。由于业绩优异,反馈学习低,而没有学习如何在多种情况下采取行动,多武装匪徒在建议系统等应用程序中引起广泛关注。同样,在推荐者系统中,合作过滤(CF)可以说是推荐者系统中最早和最有影响力的方法。关键的是,新用户和不断变化的推荐项目库是建议者系统需要应对的两种挑战。对于协作过滤,传统方法是培训模型脱线,然后进行在线测试,但这一方法不再能够处理用户偏好的动态变化,即所谓的寒冷开端。因此,如何在缺乏有效信息的情况下有效地向用户推荐项目?为了解决上述问题,提出了以多武装土匪协作过滤建议系统为基础的协作过滤系统,称为BanditMF。MF旨在解决多武装匪徒算法和协作过滤中的两项挑战:(1) 如何在有效信息稀缺的情况下解决协作过滤的冷开始问题,然后进行在线测试,但这一方法无法再处理用户偏好。(2) 如何在缺乏有效信息的情况下,在缺乏有效信息的情况下,有效地向用户推荐者推算,如何独立地解决每个用户之间以不为主的相对关系中,如何以独立地计算,如何解决与无比重的系统之间,如何解决与用户关系导致的相对关系之间,如何解决了无比重的代算法问题。