以纳什强化学习方式进行强力的垃圾垃圾检测 (Robust Spammer Detection by Nash Reinforcement Learning)

Online reviews provide product evaluations for customers to make decisions. Unfortunately, the evaluations can be manipulated using fake reviews ("spams") by professional spammers, who have learned increasingly insidious and powerful spamming strategies by adapting to the deployed detectors. Spamming strategies are hard to capture, as they can be varying quickly along time, different across spammers and target products, and more critically, remained unknown in most cases. Furthermore, most existing detectors focus on detection accuracy, which is not well-aligned with the goal of maintaining the trustworthiness of product evaluations. To address the challenges, we formulate a minimax game where the spammers and spam detectors compete with each other on their practical goals that are not solely based on detection accuracy. Nash equilibria of the game lead to stable detectors that are agnostic to any mixed detection strategies. However, the game has no closed-form solution and is not differentiable to admit the typical gradient-based algorithms. We turn the game into two dependent Markov Decision Processes (MDPs) to allow efficient stochastic optimization based on multi-armed bandit and policy gradient. We experiment on three large review datasets using various state-of-the-art spamming and detection strategies and show that the optimization algorithm can reliably find an equilibrial detector that can robustly and effectively prevent spammers with any mixed spamming strategies from attaining their practical goal. Our code is available at https://github.com/YingtongDou/Nash-Detect.

翻译：在线审查为客户提供决策所需的产品评估。不幸的是,这些评估可以通过专业垃圾邮件的假审查(“垃圾”)来操纵,他们通过适应部署的探测器,学会了越来越阴险和强大的垃圾战略。垃圾战略很难捕捉,因为它们可以随时间而迅速变化,不同垃圾和目标产品,更关键的是,在多数情况下,这些战略仍然不为人所知。此外,大多数现有探测器侧重于检测准确性,这与保持产品评估的可信度的目标不完全一致。为了应对挑战,我们设计了一个小型游戏,让垃圾和垃圾探测器彼此竞争其实际目标,而不只是基于检测的准确性。游戏领先于稳定的探测器,因为它们可以随时间而变化,而不同,不同的是,它们没有封闭式的解决方案,而且无法接受典型的基于梯度的算法。我们把游戏变成两个依赖性的马尔科夫决定进程(MDPs), 以便允许在多武装的垃圾邮件和垃圾探测器政策梯度的基础上,在它们的实际目标上相互竞争。我们在三个大型数据测试中,可以找到一个可靠的数据检测系统。