实施网络安全方面的积极学习:在电子邮件中发现异常现象</s> (Implementing Active Learning in Cybersecurity: Detecting Anomalies in Redacted Emails)

Research on email anomaly detection has typically relied on specially prepared datasets that may not adequately reflect the type of data that occurs in industry settings. In our research, at a major financial services company, privacy concerns prevented inspection of the bodies of emails and attachment details (although subject headings and attachment filenames were available). This made labeling possible anomalies in the resulting redacted emails more difficult. Another source of difficulty is the high volume of emails combined with the scarcity of resources making machine learning (ML) a necessity, but also creating a need for more efficient human training of ML models. Active learning (AL) has been proposed as a way to make human training of ML models more efficient. However, the implementation of Active Learning methods is a human-centered AI challenge due to potential human analyst uncertainty, and the labeling task can be further complicated in domains such as the cybersecurity domain (or healthcare, aviation, etc.) where mistakes in labeling can have highly adverse consequences. In this paper we present research results concerning the application of Active Learning to anomaly detection in redacted emails, comparing the utility of different methods for implementing active learning in this context. We evaluate different AL strategies and their impact on resulting model performance. We also examine how ratings of confidence that experts have in their labels can inform AL. The results obtained are discussed in terms of their implications for AL methodology and for the role of experts in model-assisted email anomaly screening.

翻译：有关电子邮件异常现象的研究通常依赖于专门编制的数据集,这些数据集可能无法充分反映行业环境中出现的那类数据。在我们的主要金融服务公司的研究中,隐私问题阻碍了对电子邮件和附文细节的检查(尽管有主题标题和附件文件名),这使得对由此导致的编辑电子邮件可能存在的异常现象的标签更加困难。另一个困难来源是大量电子邮件,加上资源匮乏,使机器学习成为必要,但也造成了对ML模型进行更有效人培训的需要。积极学习(AL)是建议提高人对ML模型培训效率的一种方法。然而,由于潜在的人类分析师不确定性,积极学习方法的实施是以人为本的AI挑战,而在网络安全领域(或保健、航空等)的错误可能产生非常不利后果的领域,贴标签的任务可能更加复杂。在这个模型中,我们介绍了将积极学习应用于对重新激活电子邮件模型的检测的研究结果,比较了在这一背景下积极学习的各种方法的效用。我们评估了AL公司在常规领域的业绩战略以及其影响如何影响。我们评估了AL公司在常规评估中如何影响。</s>

相关内容

主动学习

关注 241

主动学习是机器学习（更普遍的说是人工智能）的一个子领域，在统计学领域也叫查询学习、最优实验设计。“学习模块”和“选择策略”是主动学习算法的2个基本且重要的模块。主动学习是“一种学习方法，在这种方法中，学生会主动或体验性地参与学习过程，并且根据学生的参与程度，有不同程度的主动学习。” （Bonwell＆Eison 1991）Bonwell＆Eison（1991）指出：“学生除了被动地听课以外，还从事其他活动。” 在高等教育研究协会（ASHE）的一份报告中，作者讨论了各种促进主动学习的方法。他们引用了一些文献，这些文献表明学生不仅要做听，还必须做更多的事情才能学习。他们必须阅读，写作，讨论并参与解决问题。此过程涉及三个学习领域，即知识，技能和态度（KSA）。这种学习行为分类法可以被认为是“学习过程的目标”。特别是，学生必须从事诸如分析，综合和评估之类的高级思维任务。

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日