SamWalker+++:关于资料性取样战略的建议 (SamWalker++: recommendation with informative sampling strategy)

Recommendation from implicit feedback is a highly challenging task due to the lack of reliable negative feedback data. Existing methods address this challenge by treating all the un-observed data as negative (dislike) but downweight the confidence of these data. However, this treatment causes two problems: (1) Confidence weights of the unobserved data are usually assigned manually, which lack flexibility and may create empirical bias on evaluating user's preference. (2) To handle massive volume of the unobserved feedback data, most of the existing methods rely on stochastic inference and data sampling strategies. However, since a user is only aware of a very small fraction of items in a large dataset, it is difficult for existing samplers to select informative training instances in which the user really dislikes the item rather than does not know it. To address the above two problems, we propose two novel recommendation methods SamWalker and SamWalker++ that support both adaptive confidence assignment and efficient model learning. SamWalker models data confidence with a social network-aware function, which can adaptively specify different weights to different data according to users' social contexts. However, the social network information may not be available in many recommender systems, which hinders application of SamWalker. Thus, we further propose SamWalker++, which does not require any side information and models data confidence with a constructed pseudo-social network. We also develop fast random-walk-based sampling strategies for our SamWalker and SamWalker++ to adaptively draw informative training instances, which can speed up gradient estimation and reduce sampling variance. Extensive experiments on five real-world datasets demonstrate the superiority of the proposed SamWalker and SamWalker++.

翻译：由于缺乏可靠的负面反馈数据,来自隐含反馈的建议是一项极具挑战性的任务,因为缺乏可靠的负面反馈数据。现有的方法应对这一挑战,将所有未观察的数据视为负(不同)数据,但降低这些数据的可信度。然而,这种处理造成两个问题:(1) 未观察数据的信心加权数通常是人工分配的,缺乏灵活性,并可能在评价用户偏好方面造成经验上的偏差。(2) 处理大量未经观察的反馈数据,大多数现有方法依赖于随机性推断和数据抽样战略。然而,由于用户仅知道大数据集中只有很小一部分项目,因此,现有取样员很难选择用户真正不喜欢该项目而不是不知道该项目的信息化培训实例。为了解决上述两个问题,我们提出了两种新颖的建议方法,即SamWalker和SamWalker++,既支持适应性信任派和高效模型学习。SamWalker模型数据依赖一种基于社会网络的识别功能,可以根据用户的社会背景对不同数据进行适应性设定不同重量。然而,现有的社会网络的精确度和精确度也难以选择五种数据化模型。因此,我们提出的数据化网络信息可能无法用于快速发展。