Deploying robust machine learning models has to account for concept drifts arising due to the dynamically changing and non-stationary nature of data. Addressing drifts is particularly imperative in the security domain due to the ever-evolving threat landscape and lack of sufficiently labeled training data at the deployment time leading to performance degradation. Recently proposed concept drift detection methods in literature tackle this problem by identifying the changes in feature/data distributions and periodically retraining the models to learn new concepts. While these types of strategies should absolutely be conducted when possible, they are not robust towards attacker-induced drifts and suffer from a delay in detecting new attacks. We aim to address these shortcomings in this work. we propose a robust drift detector that not only identifies drifted samples but also discovers new classes as they arrive in an on-line fashion. We evaluate the proposed method with two security-relevant data sets -- network intrusion data set released in 2018 and APT Command and Control dataset combined with web categorization data. Our evaluation shows that our drifting detection method is not only highly accurate but also robust towards adversarial drifts and discovers new classes from drifted samples.
翻译:部署强有力的机器学习模型必须说明由于数据动态变化和非静止性质而产生的概念漂移。由于威胁环境不断变化,在部署阶段缺乏充分标记的培训数据,导致性能退化,因此在安全领域处理漂移问题特别必要。最近提出的文献中漂移概念探测方法通过查明特性/数据分布的变化和定期再培训模型以了解新概念来解决这个问题。虽然这些类型的战略在可能时绝对应该进行,但它们对攻击者引起的漂移并不强有力,而且在发现新攻击方面受到拖延。我们的目标是解决这项工作中的这些缺点。我们提议一个强有力的漂移探测器,不仅查明漂移的样品,而且在它们以在线方式到达时发现新的类别。我们用两个与安全有关的数据集 -- -- 2018年公布的网络入侵数据集和APT指挥和控制数据集与网络分类数据一起评价拟议的方法。我们的评估表明,我们的漂移探测方法不仅非常精确,而且对对抗性漂移和从漂移样品中发现新类别。