Out of distribution (OOD) detection remains a critical challenge in malware classification due to the substantial intra family variability introduced by polymorphic and metamorphic malware variants. Most existing deep learning based malware detectors rely on closed world assumptions and fail to adequately model this intra class variation, resulting in degraded performance when confronted with previously unseen malware families. This paper presents MADOOD, a novel two stage, cluster driven deep learning framework for robust OOD malware detection and classification. In the first stage, malware family embeddings are modeled using class conditional spherical decision boundaries derived from Gaussian Discriminant Analysis (GDA), enabling statistically grounded separation of indistribution and OOD samples without requiring OOD data during training. Z score based distance analysis across multiple class centroids is employed to reliably identify anomalous samples in the latent space. In the second stage, a deep neural network integrates cluster based predictions, refined embeddings, and supervised classifier outputs to enhance final classification accuracy. Extensive evaluations on benchmark malware datasets comprising 25 known families and multiple novel OOD variants demonstrate that MADOOD significantly outperforms state of the art OOD detection methods, achieving an AUC of up to 0.911 on unseen malware families. The proposed framework provides a scalable, interpretable, and statistically principled solution for real world malware detection and anomaly identification in evolving cybersecurity environments.
翻译:分布外(OOD)检测在恶意软件分类中仍是一个关键挑战,这主要源于多态和变形恶意软件变体带来的显著的家族内部变异。现有的大多数基于深度学习的恶意软件检测器依赖于封闭世界假设,未能充分建模此类类内变异,导致在面对先前未见过的恶意软件家族时性能下降。本文提出了MAD-OOD,一种新颖的两阶段、集群驱动的深度学习框架,用于实现鲁棒的OOD恶意软件检测与分类。在第一阶段,利用从高斯判别分析(GDA)推导出的类条件球形决策边界对恶意软件家族嵌入进行建模,从而能够在训练期间无需OOD数据的情况下,基于统计依据分离分布内样本与OOD样本。通过跨多个类中心进行基于Z分数的距离分析,以可靠地识别潜在空间中的异常样本。在第二阶段,一个深度神经网络整合了基于集群的预测、精炼的嵌入以及监督分类器的输出,以提升最终分类精度。在包含25个已知家族和多个新型OOD变体的基准恶意软件数据集上进行的大量评估表明,MAD-OOD显著优于最先进的OOD检测方法,在未见过的恶意软件家族上实现了高达0.911的AUC。所提出的框架为不断演进的网络安全环境中的实际恶意软件检测和异常识别提供了一个可扩展、可解释且基于统计原理的解决方案。