基于元数据的儿童性虐待材料检测 (Metadata-Based Detection of Child Sexual Abuse Material)

Child Sexual Abuse Media (CSAM) is any visual record of a sexually-explicit activity involving minors. CSAM impacts victims differently from the actual abuse because the distribution never ends, and images are permanent. Machine learning-based solutions can help law enforcement quickly identify CSAM and block digital distribution. However, collecting CSAM imagery to train machine learning models has many ethical and legal constraints, creating a barrier to research development. With such restrictions in place, the development of CSAM machine learning detection systems based on file metadata uncovers several opportunities. Metadata is not a record of a crime, and it does not have legal restrictions. Therefore, investing in detection systems based on metadata can increase the rate of discovery of CSAM and help thousands of victims. We propose a framework for training and evaluating deployment-ready machine learning models for CSAM identification. Our framework provides guidelines to evaluate CSAM detection models against intelligent adversaries and models' performance with open data. We apply the proposed framework to the problem of CSAM detection based on file paths. In our experiments, the best-performing model is based on convolutional neural networks and achieves an accuracy of 0.97. Our evaluation shows that the CNN model is robust against offenders actively trying to evade detection by evaluating the model against adversarially modified data. Experiments with open datasets confirm that the model generalizes well and is deployment-ready.

翻译：儿童性虐待媒体(CSAM)是涉及未成年人的性活动的任何视觉记录。CSAM对受害者的影响不同于实际虐待,因为其传播从未结束,而且图像是永久的。机器学习解决方案可以帮助执法部门迅速确定CSAM,并阻止数字传播。然而,收集CSAAM图像以培训机器学习模式,有许多道德和法律限制,为研究开发制造障碍。有了这些限制,CSAAM基于档案元数据开发机器学习检测系统发现了若干机会。元数据不是犯罪记录,也没有法律限制。因此,投资于基于元数据的探测系统可以提高CSAM的发现率,帮助成千上万受害者。我们提出了一个框架,用于培训和评价CSAM识别的成熟机学习模式。我们的框架提供了指导方针,用以评价CSAAM检测模式与智能对手和模型的公开数据性能。我们根据档案路径对CSAAM检测问题应用了拟议框架。在我们的实验中,最佳表现模型基于革命神经网络,并且没有法律限制。因此,投资于基于元数据的发现速度模型的精确度。我们提出的一个框架表明,对CNAM数据库进行严格的实验性测试,通过测试来测试测试,以便测试。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/