对重新包装的软件进行变形检测 (Metamorphic Detection of Repackaged Malware)

Machine learning-based malware detection systems are often vulnerable to evasion attacks, in which a malware developer manipulates their malicious software such that it is misclassified as benign. Such software hides some properties of the real class or adopts some properties of a different class by applying small perturbations. A special case of evasive malware hides by repackaging a bonafide benign mobile app to contain malware in addition to the original functionality of the app, thus retaining most of the benign properties of the original app. We present a novel malware detection system based on metamorphic testing principles that can detect such benign-seeming malware apps. We apply metamorphic testing to the feature representation of the mobile app rather than to the app itself. That is, the source input is the original feature vector for the app and the derived input is that vector with selected features removed. If the app was originally classified benign and is indeed benign, the output for the source and derived inputs should be the same class, i.e., benign, but if they differ, then the app is exposed as likely malware. Malware apps originally classified as malware should retain that classification since only features prevalent in benign apps are removed. This approach enables the machine learning model to classify repackaged malware with reasonably few false negatives and false positives. Our training pipeline is simpler than many existing ML-based malware detection methods, as the network is trained end-to-end to learn appropriate features and perform classification. We pre-trained our classifier model on 3 million apps collected from the widely-used AndroZoo dataset. We perform an extensive study on other publicly available datasets to show our approach's effectiveness in detecting repackaged malware with more than94% accuracy, 0.98 precision, 0.95 recall, and 0.96 F1 score.

翻译：以机器学习为基础的恶意软件检测系统往往容易受到规避攻击, 恶意软件开发者根据变形测试原则操纵其恶意软件, 从而将其错误分类为良性。这种软件隐藏真实类的某些属性, 或者通过使用小扰动来采用不同类的某些属性。一个特殊的例子, 蒸发恶意软件隐藏, 通过重新包装一个善意的良性移动应用程序, 包含恶意软件, 除了软件的原始功能外, 还会包含恶意软件, 从而保留原始应用程序的大多数良性属性。我们根据变形测试原则, 提出了一个新的恶意软件检测系统, 可以检测良性假的恶意软件。我们对移动应用程序的特性进行变换式测试, 而不是对软件本身应用。这就是, 源性输入是软件的原始特性矢量矢量矢量。如果应用程序最初被分类为良性, 源和衍生的模型的输出应该是同一类, 即, 良性, 但是如果它们不同, 软件只能被暴露为可能的恶意软件。