The use of Machine Learning has become a significant part of malware detection efforts due to the influx of new malware, an ever changing threat landscape, and the ability of Machine Learning methods to discover meaningful distinctions between malicious and benign software. Antivirus vendors have also begun to widely utilize malware classifiers based on dynamic and static malware analysis features. Therefore, a malware author might make evasive binary modifications against Machine Learning models as part of the malware development life cycle to execute an attack successfully. This makes the studying of possible classifier evasion strategies an essential part of cyber defense against malice. To this extent, we stage a grey box setup to analyze a scenario where the malware author does not know the target classifier algorithm, and does not have access to decisions made by the classifier, but knows the features used in training. In this experiment, a malicious actor trains a surrogate model using the EMBER-2018 dataset to discover binary mutations that cause an instance to be misclassified via a Monte Carlo tree search. Then, mutated malware is sent to the victim model that takes the place of an antivirus API to test whether it can evade detection.
翻译:由于新恶意软件的流入、不断变化的威胁环境以及机器学习方法发现恶意软件与良性软件之间有意义的区别的能力,机器学习的使用已成为恶意软件检测努力的一个重要部分。 反病毒供应商还开始广泛使用基于动态和静态恶意软件分析功能的恶意软件分类器。 因此,恶意软件作者可能会对机器学习模型进行回避二进制修改,作为恶意软件开发生命周期的一部分,以成功实施攻击。 这使得研究可能的分类规避策略成为网络防御防止恶意的一个重要部分。 因此, 我们设置了一个灰色盒子, 分析恶意软件作者不知道目标分类算法, 并且无法接触分类师的决定, 但知道培训中使用的特征。 在这个实验中, 恶意行为者会用EMER-2018 数据集来训练一个suroget模型, 以发现通过蒙特卡洛树搜索导致错误分类的二进制突变。 然后, 混变的恶意软件被发送到受害者模型中, 该模型取代了抗病毒 API 的所在地, 以测试它能否逃避检测 。