Deep learning (DL) has proven to be effective in detecting sophisticated malware that is constantly evolving. Even though deep learning has alleviated the feature engineering problem, finding the most optimal DL model, in terms of neural architecture search (NAS) and the model's optimal set of hyper-parameters, remains a challenge that requires domain expertise. In addition, many of the proposed state-of-the-art models are very complex and may not be the best fit for different datasets. A promising approach, known as Automated Machine Learning (AutoML), can reduce the domain expertise required to implement a custom DL model. AutoML reduces the amount of human trial-and-error involved in designing DL models, and in more recent implementations can find new model architectures with relatively low computational overhead. This work provides a comprehensive analysis and insights on using AutoML for static and online malware detection. For static, our analysis is performed on two widely used malware datasets: SOREL-20M to demonstrate efficacy on large datasets; and EMBER-2018, a smaller dataset specifically curated to hinder the performance of machine learning models. In addition, we show the effects of tuning the NAS process parameters on finding a more optimal malware detection model on these static analysis datasets. Further, we also demonstrate that AutoML is performant in online malware detection scenarios using Convolutional Neural Networks (CNNs) for cloud IaaS. We compare an AutoML technique to six existing state-of-the-art CNNs using a newly generated online malware dataset with and without other applications running in the background during malware execution.In general, our experimental results show that the performance of AutoML based static and online malware detection models are on par or even better than state-of-the-art models or hand-designed models presented in literature.
翻译:深层学习( DL) 已证明在探测不断演变的复杂恶意软件方面是有效的。 尽管深层学习已经缓解了特效工程问题, 找到最优化的 DL 模型, 包括神经结构搜索( NAS) 和模型最佳的超参数集, 仍然是需要域域内专门知识的挑战。 此外, 许多最先进的模型非常复杂, 可能不适合不同的数据集。 一个有希望的方法, 称为自动机器学习( AutomalML), 能够减少实施自定义 DL 模型所需的域域内专门知识。 自动ML 减少了设计 DL 模型所涉的人类试验和机机型数量, 而在较近期的安装中, 找到新的模型结构结构, 并且使用自动模型进行全面分析和洞察。 为了静态和在线软件, 我们的分析是在两种广泛使用的错误数据模型上进行: SOREL-20M, 以显示大型数据集的效能; 以及 EMER-2018, 一个更小的数据装置, 具体地刻录了用于设计DL 模型, 的自动检测结果。</s>