Social networks are polluted by rumors, which can be detected by machine learning models. However, the models are fragile and understanding the vulnerabilities is critical to rumor detection. Certain vulnerabilities are due to dependencies on the graphs and suspiciousness ranking and are difficult for end-to-end methods to learn from limited noisy data. With a black-box detector, we design features capturing the dependencies to allow a reinforcement learning to learn an effective and interpretable attack policy based on the detector output. To speed up learning, we devise: (i) a credit assignment method that decomposes delayed rewards to individual attacking steps proportional to their effects; (ii) a time-dependent control variate to reduce variance due to large graphs and many attacking steps. On two social rumor datasets, we demonstrate: (i) the effectiveness of the attacks compared to rule-based attacks and end-to-end approaches; (ii) the usefulness of the proposed credit assignment strategy and control variate; (iii) interpretability of the policy when generating strong attacks.
翻译:社会网络受到传闻的污染,这些传闻可以通过机器学习模型探测出来。但是,这些模型是脆弱的,理解这些弱点对于传闻的发现至关重要。某些弱点是由于图表的依附性和可疑程度的排序,而且对于终端到终端的方法很难从有限的噪音数据中学习。有了黑盒探测器,我们设计了捕捉依赖性的特征,以便能够通过强化学习学习来学习基于探测器输出的有效和可解释的攻击政策。为了加速学习,我们设计了:(一) 信用分配方法,对个人攻击步骤的奖励与它们的影响成比例地进行分解;(二) 取决于时间的控制变异,以减少因大图表和许多攻击步骤而产生的差异。关于两个社会传闻数据集,我们展示了:(一) 攻击与基于规则的攻击和终端到终端的方法相比的有效性;(二) 拟议的信用分配战略和控制变量的有用性;(三) 该政策在产生强烈攻击时的可解释性。