Recommender systems play a crucial role in helping users to find their interested information in various web services such as Amazon, YouTube, and Google News. Various recommender systems, ranging from neighborhood-based, association-rule-based, matrix-factorization-based, to deep learning based, have been developed and deployed in industry. Among them, deep learning based recommender systems become increasingly popular due to their superior performance. In this work, we conduct the first systematic study on data poisoning attacks to deep learning based recommender systems. An attacker's goal is to manipulate a recommender system such that the attacker-chosen target items are recommended to many users. To achieve this goal, our attack injects fake users with carefully crafted ratings to a recommender system. Specifically, we formulate our attack as an optimization problem, such that the injected ratings would maximize the number of normal users to whom the target items are recommended. However, it is challenging to solve the optimization problem because it is a non-convex integer programming problem. To address the challenge, we develop multiple techniques to approximately solve the optimization problem. Our experimental results on three real-world datasets, including small and large datasets, show that our attack is effective and outperforms existing attacks. Moreover, we attempt to detect fake users via statistical analysis of the rating patterns of normal and fake users. Our results show that our attack is still effective and outperforms existing attacks even if such a detector is deployed.
翻译:推荐人系统在帮助用户在亚马逊、YouTube和Google News等各种网络服务中找到其感兴趣的信息方面发挥着关键作用。 各种推荐人系统,从街坊型、协会规则型、矩阵因素型、基深层次学习型,到行业开发并部署。 其中,深层学习型推荐人系统因其优异性能而越来越受欢迎。 在这项工作中,我们进行关于数据中毒袭击的首次系统研究,到深层学习型推荐人系统。攻击者的目标是操纵一个推荐人系统,以便向许多用户推荐攻击者选择的目标项目。为了实现这一目标,我们的攻击注射了假冒用户,精心设计了给推荐人系统的评级。具体地说,我们把攻击设计成一个优化型的问题,这样,注入的推荐人系统将最大限度地增加目标项目推荐的正常用户的数量。然而,解决优化问题的难度很大,因为这是一个非曲线组合组合式的编程问题。 为了应对挑战,我们开发了多种技术,可以大致解决最优化问题。 为了实现这一目标,我们甚至对现实世界三次袭击的实验结果,我们的实验结果,我们通过正常攻击的用户, 包括小和大比例分析, 我们的统计式的用户是模拟攻击的测试系统,我们现在的测试, 我们的测试, 测试了一个有效的攻击的系统, 我们的测试了一个有效的攻击的系统, 测试了我们现在的系统, 我们的系统, 测试了一个有效的攻击的系统, 测试了一个模拟式的系统, 测试了我们的系统, 测试了我们的系统是模拟式的模型是模拟式的系统。