Android malware detection is a significat problem that affects billions of users using millions of Android applications (apps) in existing markets. This paper proposes PetaDroid, a framework for accurate Android malware detection and family clustering on top of static analyses. PetaDroid automatically adapts to Android malware and benign changes over time with resilience to common binary obfuscation techniques. The framework employs novel techniques elaborated on top of natural language processing (NLP) and machine learning techniques to achieve accurate, adaptive, and resilient Android malware detection and family clustering. PetaDroid identifies malware using an ensemble of convolutional neural network (CNN) on proposed Inst2Vec features. The framework clusters the detected malware samples into malware family groups utilizing sample feature digests generated using deep neural auto-encoder. For change adaptation, PetaDroid leverages the detection confidence probability during deployment to automatically collect extension datasets and periodically use them to build new malware detection models. Besides, PetaDroid uses code-fragment randomization during the training to enhance the resiliency to common obfuscation techniques. We extensively evaluated PetaDroid on multiple reference datasets. PetaDroid achieved a high detection rate (98-99% f1-score) under different evaluation settings with high homogeneity in the produced clusters (96%). We conducted a thorough quantitative comparison with state-of-the-art solutions MaMaDroid, DroidAPIMiner, MalDozer, in which PetaDroid outperforms them under all the evaluation settings.
翻译:使用百万个安氏剂应用软件(应用软件)在现有市场中影响数十亿用户。 本文建议使用PetaDroid, 一个精确的Android 恶意软件检测和家庭聚居的框架。 PetaDroid 自动适应Android恶意软件和良性变化,同时适应常见的二进制浸泡技术。 这个框架使用在自然语言处理(NLP)和机器学习技术顶端开发的新技术,以实现准确、适应性和抗御性强的透彻和机器人恶意软件检测和家庭集聚。 PetaDroid在Inst2Vec 功能上提出“ PetaDroid ”,这是一个用于准确检测和适应性以及适应性强力的机器人应用。 PetaDroid 使用一个合成软件来识别恶意软件,这是在Inststock2Vec 功能上准确的合成神经神经系统(CNNN) 。 这个框架将检测的软件样本样本样本样本组群集中化成。 为了自动收集扩展数据集, PetaDaDroid 定期使用它们来建立新的恶意检测模型模型模型模型模型模型。 此外, 在培训期间, 做了一个已实现的智能检测中, ASmaid Dalde ade- droid a ladealde a lade a ladealdealdealde a lade lade a lade a lade lade a lade lade a lade lade