基于 Maware 检测软件的深学习自动机学习</s> (Automated Machine Learning for Deep Learning based Malware Detection)

Deep learning (DL) has proven to be effective in detecting sophisticated malware that is constantly evolving. Even though deep learning has alleviated the feature engineering problem, finding the most optimal DL model, in terms of neural architecture search (NAS) and the model's optimal set of hyper-parameters, remains a challenge that requires domain expertise. In addition, many of the proposed state-of-the-art models are very complex and may not be the best fit for different datasets. A promising approach, known as Automated Machine Learning (AutoML), can reduce the domain expertise required to implement a custom DL model. AutoML reduces the amount of human trial-and-error involved in designing DL models, and in more recent implementations can find new model architectures with relatively low computational overhead. This work provides a comprehensive analysis and insights on using AutoML for static and online malware detection. For static, our analysis is performed on two widely used malware datasets: SOREL-20M to demonstrate efficacy on large datasets; and EMBER-2018, a smaller dataset specifically curated to hinder the performance of machine learning models. In addition, we show the effects of tuning the NAS process parameters on finding a more optimal malware detection model on these static analysis datasets. Further, we also demonstrate that AutoML is performant in online malware detection scenarios using Convolutional Neural Networks (CNNs) for cloud IaaS. We compare an AutoML technique to six existing state-of-the-art CNNs using a newly generated online malware dataset with and without other applications running in the background during malware execution.In general, our experimental results show that the performance of AutoML based static and online malware detection models are on par or even better than state-of-the-art models or hand-designed models presented in literature.

翻译：深层学习( DL) 已证明在探测不断演变的复杂恶意软件方面是有效的。尽管深层学习已经缓解了特效工程问题, 找到最优化的 DL 模型, 包括神经结构搜索( NAS) 和模型最佳的超参数集, 仍然是需要域域内专门知识的挑战。此外, 许多最先进的模型非常复杂, 可能不适合不同的数据集。一个有希望的方法, 称为自动机器学习( AutomalML), 能够减少实施自定义 DL 模型所需的域域内专门知识。自动ML 减少了设计 DL 模型所涉的人类试验和机机型数量, 而在较近期的安装中, 找到新的模型结构结构, 并且使用自动模型进行全面分析和洞察。为了静态和在线软件, 我们的分析是在两种广泛使用的错误数据模型上进行: SOREL-20M, 以显示大型数据集的效能; 以及 EMER-2018, 一个更小的数据装置, 具体地刻录了用于设计DL 模型, 的自动检测结果。</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/