Nowadays, people generate and share massive content on online platforms (e.g., social networks, blogs). In 2021, the 1.9 billion daily active Facebook users posted around 150 thousand photos every minute. Content moderators constantly monitor these online platforms to prevent the spreading of inappropriate content (e.g., hate speech, nudity images). Based on deep learning (DL) advances, Automatic Content Moderators (ACM) help human moderators handle high data volume. Despite their advantages, attackers can exploit weaknesses of DL components (e.g., preprocessing, model) to affect their performance. Therefore, an attacker can leverage such techniques to spread inappropriate content by evading ACM. In this work, we propose CAPtcha Attack (CAPA), an adversarial technique that allows users to spread inappropriate text online by evading ACM controls. CAPA, by generating custom textual CAPTCHAs, exploits ACM's careless design implementations and internal procedures vulnerabilities. We test our attack on real-world ACM, and the results confirm the ferocity of our simple yet effective attack, reaching up to a 100% evasion success in most cases. At the same time, we demonstrate the difficulties in designing CAPA mitigations, opening new challenges in CAPTCHAs research area.
翻译:目前,人们在网上平台(如社交网络、博客)上产生和分享大量内容。2021年,每天有19亿活跃的Facebook用户在网上平台上每分钟张贴约15万张照片;内容主持人不断监测这些在线平台,以防止不适当内容(如仇恨言论、裸体图像)的传播。根据深入学习(DL)的进步,自动内容主持人(ACM)帮助人类主持人处理高数据量。攻击者尽管有其优势,但可以利用DL组件(如预处理、模型)的弱点影响其性能。因此,攻击者可以利用这种技术通过逃避ACM传播不适当的内容。在这项工作中,我们建议CAPtcha攻击(CAPA),这是一种对抗性技术,通过逃避ACM控制,让用户在网上传播不适当的文本。 CAPA,通过建立定制的文本CAPTCHA,利用AC的无意识的设计实施和内部程序弱点。我们测试了对现实世界ACM的进攻,结果可以证实我们简单而有效的攻击的强度,通过逃避ACM(CACA)的难度达到100%的难度。在设计新的CAPTA中,在设计困难中也证明了成功。