使用主动学习手段进行 Living-Out-Live-The-Land-Land 指令探测 (Living-Off-The-Land Command Detection Using Active Learning)

In recent years, enterprises have been targeted by advanced adversaries who leverage creative ways to infiltrate their systems and move laterally to gain access to critical data. One increasingly common evasive method is to hide the malicious activity behind a benign program by using tools that are already installed on user computers. These programs are usually part of the operating system distribution or another user-installed binary, therefore this type of attack is called "Living-Off-The-Land". Detecting these attacks is challenging, as adversaries may not create malicious files on the victim computers and anti-virus scans fail to detect them. We propose the design of an Active Learning framework called LOLAL for detecting Living-Off-the-Land attacks that iteratively selects a set of uncertain and anomalous samples for labeling by a human analyst. LOLAL is specifically designed to work well when a limited number of labeled samples are available for training machine learning models to detect attacks. We investigate methods to represent command-line text using word-embedding techniques, and design ensemble boosting classifiers to distinguish malicious and benign samples based on the embedding representation. We leverage a large, anonymized dataset collected by an endpoint security product and demonstrate that our ensemble classifiers achieve an average F1 score of 0.96 at classifying different attack classes. We show that our active learning method consistently improves the classifier performance, as more training data is labeled, and converges in less than 30 iterations when starting with a small number of labeled instances.

翻译：近些年来,企业一直是利用创新手段渗透其系统并横向移动以获取关键数据的高级对手的目标。一种日益常见的回避方法是,利用用户计算机上已经安装的工具,将恶意活动隐藏在良性程序背后。这些程序通常是操作系统分布的一部分,或另一个用户安装的二进制,因此,这种类型的攻击被称为“Living-off-The-Land ” 。检测这些攻击具有挑战性,因为对手可能不会在受害者计算机上创建恶意文件,反病毒扫描也无法检测到这些数据。我们提议设计一个名为LOLAL的积极学习框架,以探测“Live-Offer-Land”攻击,反复选择一套不确定和反常性标本样本,供人类分析师贴标签。LOLAL的具体设计是,当只有数量有限的贴标签的样本可供培训机器学习模型检测袭击时,我们调查使用单词添加技术代表指令-线文本的方法,并设计混合的分类,以区分基于嵌入式系统1 的高级性、将一个大等级化的标签化数据分类,以显示我们的平均性化的升级方法,从而显示一个高分级的安全性地显示一个大分级的等级数据。

相关内容

主动学习

关注 240

主动学习是机器学习（更普遍的说是人工智能）的一个子领域，在统计学领域也叫查询学习、最优实验设计。“学习模块”和“选择策略”是主动学习算法的2个基本且重要的模块。主动学习是“一种学习方法，在这种方法中，学生会主动或体验性地参与学习过程，并且根据学生的参与程度，有不同程度的主动学习。” （Bonwell＆Eison 1991）Bonwell＆Eison（1991）指出：“学生除了被动地听课以外，还从事其他活动。” 在高等教育研究协会（ASHE）的一份报告中，作者讨论了各种促进主动学习的方法。他们引用了一些文献，这些文献表明学生不仅要做听，还必须做更多的事情才能学习。他们必须阅读，写作，讨论并参与解决问题。此过程涉及三个学习领域，即知识，技能和态度（KSA）。这种学习行为分类法可以被认为是“学习过程的目标”。特别是，学生必须从事诸如分析，综合和评估之类的高级思维任务。

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

专知会员服务

81+阅读 · 2020年5月20日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日