Recommender systems provide essential web services by learning users' personal preferences from collected data. However, in many cases, systems also need to forget some training data. From the perspective of privacy, several privacy regulations have recently been proposed, requiring systems to eliminate any impact of the data whose owner requests to forget. From the perspective of utility, if a system's utility is damaged by some bad data, the system needs to forget these data to regain utility. From the perspective of usability, users can delete noise and incorrect entries so that a system can provide more useful recommendations. While unlearning is very important, it has not been well-considered in existing recommender systems. Although there are some researches have studied the problem of machine unlearning in the domains of image and text data, existing methods can not been directly applied to recommendation as they are unable to consider the collaborative information. In this paper, we propose RecEraser, a general and efficient machine unlearning framework tailored to recommendation task. The main idea of RecEraser is to partition the training set into multiple shards and train a constituent model for each shard. Specifically, to keep the collaborative information of the data, we first design three novel data partition algorithms to divide training data into balanced groups based on their similarity. Then, considering that different shard models do not uniformly contribute to the final prediction, we further propose an adaptive aggregation method to improve the global model utility. Experimental results on three public benchmarks show that RecEraser can not only achieve efficient unlearning, but also outperform the state-of-the-art unlearning methods in terms of model utility. The source code can be found at https://github.com/chenchongthu/Recommendation-Unlearning
翻译:推荐者系统通过从收集的数据中学习用户的个人偏好来提供基本的网络服务。 但是, 在许多情况下, 系统也需要忘记一些培训数据。 从隐私的角度来看, 最近提出了几项隐私条例, 要求系统消除所有者要求忘记的数据的任何影响。 从实用的角度来看, 如果一个系统的效用因一些坏数据而受损, 系统需要忘记这些数据才能重新获得效用。 从可用性角度看, 用户可以删除噪音和不正确的条目, 以便一个系统能够提供更有用的建议。 虽然不学习非常重要, 但现有的推荐基准系统并没有很好考虑。 尽管有些研究最近提出了若干隐私条例, 要求消除数据所有人要求忘记的数据的任何影响。 从实用的角度来看, 如果系统效用受到某些坏数据损坏, 系统需要忘记这些数据。 从可用性的角度看, 系统用户可以删除噪音和不正确的条目, 这样系统可以提供更有用的建议。 尽管不学习非常重要, 但现有的推荐基准系统并没有很好地考虑过。 尽管有些研究已经做了一些研究, 但是有些研究已经研究, 在图像和文本领域, 现有的方法无法直接应用。 我们设计三个新的数据分析方法 。