In recent years, enterprises have been targeted by advanced adversaries who leverage creative ways to infiltrate their systems and move laterally to gain access to critical data. One increasingly common evasive method is to hide the malicious activity behind a benign program by using tools that are already installed on user computers. These programs are usually part of the operating system distribution or another user-installed binary, therefore this type of attack is called "Living-Off-The-Land". Detecting these attacks is challenging, as adversaries may not create malicious files on the victim computers and anti-virus scans fail to detect them. We propose the design of an Active Learning framework called LOLAL for detecting Living-Off-the-Land attacks that iteratively selects a set of uncertain and anomalous samples for labeling by a human analyst. LOLAL is specifically designed to work well when a limited number of labeled samples are available for training machine learning models to detect attacks. We investigate methods to represent command-line text using word-embedding techniques, and design ensemble boosting classifiers to distinguish malicious and benign samples based on the embedding representation. We leverage a large, anonymized dataset collected by an endpoint security product and demonstrate that our ensemble classifiers achieve an average F1 score of 0.96 at classifying different attack classes. We show that our active learning method consistently improves the classifier performance, as more training data is labeled, and converges in less than 30 iterations when starting with a small number of labeled instances.
翻译:近些年来,企业一直是利用创新手段渗透其系统并横向移动以获取关键数据的高级对手的目标。一种日益常见的回避方法是,利用用户计算机上已经安装的工具,将恶意活动隐藏在良性程序背后。这些程序通常是操作系统分布的一部分,或另一个用户安装的二进制,因此,这种类型的攻击被称为“Living-off-The-Land ” 。 检测这些攻击具有挑战性,因为对手可能不会在受害者计算机上创建恶意文件,反病毒扫描也无法检测到这些数据。我们提议设计一个名为LOLAL的积极学习框架,以探测“Live-Offer-Land”攻击,反复选择一套不确定和反常性标本样本,供人类分析师贴标签。LOLAL的具体设计是,当只有数量有限的贴标签的样本可供培训机器学习模型检测袭击时,我们调查使用单词添加技术代表指令-线文本的方法,并设计混合的分类,以区分基于嵌入式系统1 的高级性、 将一个大等级化的标签化数据分类,以显示我们的平均性化的升级方法,从而显示一个高分级的安全性地显示一个大分级的等级数据。