The popularity of Android OS has made it an appealing target to malware developers. To evade detection, including by ML-based techniques, attackers invest in creating malware that closely resemble legitimate apps. In this paper, we propose GUIDED RETRAINING, a supervised representation learning-based method that boosts the performance of a malware detector. First, the dataset is split into "easy" and "difficult" samples, where difficulty is associated to the prediction probabilities yielded by a malware detector: for difficult samples, the probabilities are such that the classifier is not confident on the predictions, which have high error rates. Then, we apply our GUIDED RETRAINING method on the difficult samples to improve their classification. For the subset of "easy" samples, the base malware detector is used to make the final predictions since the error rate on that subset is low by construction. For the subset of "difficult" samples, we rely on GUIDED RETRAINING, which leverages the correct predictions and the errors made by the base malware detector to guide the retraining process. GUIDED RETRAINING focuses on the difficult samples: it learns new embeddings of these samples using Supervised Contrastive Learning and trains an auxiliary classifier for the final predictions. We validate our method on four state-of-the-art Android malware detection approaches using over 265k malware and benign apps, and we demonstrate that GUIDED RETRAINING can reduce up to 40.41% prediction errors made by the malware detectors. Our method is generic and designed to enhance the classification performance on a binary classification task. Consequently, it can be applied to other classification problems beyond Android malware detection.
翻译:Android OS 的普及使得它成为恶意软件开发者的吸引力目标。 为了躲避检测, 包括以 ML 为基础的技术, 攻击者投资创建与合法应用程序非常相似的恶意软件。 在本文中, 我们提议使用我们的 GUIDED RETRAININING 方法, 这是一种监督的演示学习方法, 提高恶意软件检测器的性能。 首先, 数据集分为“ 容易” 和“ 困难” 样本, 与恶意软件检测器产生的预测概率有关: 对于困难的样本, 我们依靠GUID41 RETRAININING, 分类方法使得分类者对预测不有信心, 并且有很高的误差率。 然后, 我们用我们的GUDEDRETRAIN 方法, 用来指导错误软件的升级。 基础的错误检测和智能智能智能模型, 我们用这个工具来引导我们 的错误的货币变现, 我们的变现方法, 我们用一个难的变现方法, 我们的变的变数的变的变数 。