通过多立方力深层次强化学习,进行反向社会博学 (Adversarial Socialbot Learning via Multi-Agent Deep Hierarchical Reinforcement Learning)

Socialbots are software-driven user accounts on social platforms, acting autonomously (mimicking human behavior), with the aims to influence the opinions of other users or spread targeted misinformation for particular goals. As socialbots undermine the ecosystem of social platforms, they are often considered harmful. As such, there have been several computational efforts to auto-detect the socialbots. However, to our best knowledge, the adversarial nature of these socialbots has not yet been studied. This begs a question "can adversaries, controlling socialbots, exploit AI techniques to their advantage?" To this question, we successfully demonstrate that indeed it is possible for adversaries to exploit computational learning mechanism such as reinforcement learning (RL) to maximize the influence of socialbots while avoiding being detected. We first formulate the adversarial socialbot learning as a cooperative game between two functional hierarchical RL agents. While one agent curates a sequence of activities that can avoid the detection, the other agent aims to maximize network influence by selectively connecting with right users. Our proposed policy networks train with a vast amount of synthetic graphs and generalize better than baselines on unseen real-life graphs both in terms of maximizing network influence (up to +18%) and sustainable stealthiness (up to +40% undetectability) under a strong bot detector (with 90% detection accuracy). During inference, the complexity of our approach scales linearly, independent of a network's structure and the virality of news. This makes our approach a practical adversarial attack when deployed in a real-life setting.

翻译：社交机器人是社会平台上由软件驱动的用户账户,他们自主行动(模仿人类行为),目的是影响其他用户的意见,或传播特定目标的定向错误信息。社会机器人破坏社会平台的生态系统,因此往往被视为有害。因此,我们曾作出数项计算努力,自动检测社交机器人。然而,据我们所知,这些社交机器人的对抗性质尚未研究。这引起了一个问题,即“能够对手,控制社交机器人,利用AI技术为自己谋利?”至此,我们成功地证明对手有可能利用计算学习机制,如强化学习(RL),以最大限度地扩大社交机器人的影响,同时避免被察觉。我们首先将对抗社交机器人学习设计为两个功能等级的RL代理之间的合作游戏。虽然一个代理商为一系列可以避免被检测的活动,但另一个代理商的目的是通过有选择地与正确的用户连接来最大限度地扩大网络影响。我们拟议的政策网络将大量合成图表和广度提高到一个比真实网络的基线更准确性(在真实的网络中,在真实的扫描时间里程时间里程中) 和直线性定位中将一个比基线更精确地设定一个更精确的网络。