机器不学习:垃圾邮件的学习、污染和不学习 (Machine Unlearning: Learning, Polluting, and Unlearning for Spam Email)

Machine unlearning for security is studied in this context. Several spam email detection methods exist, each of which employs a different algorithm to detect undesired spam emails. But these models are vulnerable to attacks. Many attackers exploit the model by polluting the data, which are trained to the model in various ways. So to act deftly in such situations model needs to readily unlearn the polluted data without the need for retraining. Retraining is impractical in most cases as there is already a massive amount of data trained to the model in the past, which needs to be trained again just for removing a small amount of polluted data, which is often significantly less than 1%. This problem can be solved by developing unlearning frameworks for all spam detection models. In this research, unlearning module is integrated into spam detection models that are based on Naive Bayes, Decision trees, and Random Forests algorithms. To assess the benefits of unlearning over retraining, three spam detection models are polluted and exploited by taking attackers' positions and proving models' vulnerability. Reduction in accuracy and true positive rates are shown in each case showing the effect of pollution on models. Then unlearning modules are integrated into the models, and polluted data is unlearned; on testing the models after unlearning, restoration of performance is seen. Also, unlearning and retraining times are compared with different pollution data sizes on all models. On analyzing the findings, it can be concluded that unlearning is considerably superior to retraining. Results show that unlearning is fast, easy to implement, easy to use, and effective.

翻译：在这种背景下研究安全学的机器。有一些垃圾邮件检测方法存在, 每种方法都使用不同的算法来检测不理想的垃圾邮件。但是这些模型很容易受到攻击。许多攻击者利用模型污染数据, 并且以各种方式对模型进行培训。因此, 在这种情况下, 需要轻率地采取行动, 很容易地解开被污染的数据, 而不需要再培训。在多数情况下, 重新培训是不切实际的, 因为过去已经有大量的模型培训过, 需要再次培训, 仅仅为了清除少量的污染数据, 这些数据往往远远少于1%。但是, 这些模型很容易受到攻击。这个问题可以通过为所有垃圾检测模型开发不学习的框架来解决。在这个研究中, 不学习模块被整合到基于自然湾、决定树和随机森林算法的垃圾检测模型中。为了评估不学习, 三种垃圾检测模型被污染, 可以通过学习到模型的脆弱性来加以利用。在每个案例中, 降低准确性和真实的准确率, 显示污染对模型的影响不明显地更高。之后, 不学习模块被整合到, 。重新学习之后, 将将数据重新学习的模型将重新学习。。。重新学习。。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/