Deep learning-based recommender systems have become an integral part of several online platforms. However, their black-box nature emphasizes the need for explainable artificial intelligence (XAI) approaches to provide human-understandable reasons why a specific item gets recommended to a given user. One such method is counterfactual explanation (CF). While CFs can be highly beneficial for users and system designers, malicious actors may also exploit these explanations to undermine the system's security. In this work, we propose H-CARS, a novel strategy to poison recommender systems via CFs. Specifically, we first train a logical-reasoning-based surrogate model on training data derived from counterfactual explanations. By reversing the learning process of the recommendation model, we thus develop a proficient greedy algorithm to generate fabricated user profiles and their associated interaction records for the aforementioned surrogate model. Our experiments, which employ a well-known CF generation method and are conducted on two distinct datasets, show that H-CARS yields significant and successful attack performance.
翻译:基于深度学习的推荐系统已成为多个在线平台不可或缺的一部分。然而,其黑盒本质强调了对可解释人工智能(XAI)方法的需求,以提供用户特定的推荐和人类可理解的原因。其中一种方法是反事实解释(CF)。虽然CF对用户和系统设计者非常有益,但恶意行为者也可能利用这些解释来破坏系统的安全性。在这项工作中,我们提出了一种新的策略H-CARS,通过CF来毒化推荐系统。具体而言,我们首先使用反事实解释中的训练数据训练基于逻辑推理的替代模型。通过颠倒推荐模型的学习过程,我们开发了一种高效的贪心算法,来为上述代理模型生成虚假的用户资料和相关的交互记录。我们使用一个众所周知的CF生成方法,在两个不同的数据集上进行实验,结果显示H-CARS具有显著的攻击性能,且攻击成功率高。