The presence and persistence of Android malware is an on-going threat that plagues this information era, and machine learning technologies are now extensively used to deploy more effective detectors that can block the majority of these malicious programs. However, these algorithms have not been developed to pursue the natural evolution of malware, and their performances significantly degrade over time because of such concept-drift. Currently, state-of-the-art techniques only focus on detecting the presence of such drift, or they address it by relying on frequent updates of models. Hence, there is a lack of knowledge regarding the cause of the concept drift, and ad-hoc solutions that can counter the passing of time are still under-investigated. In this work, we commence to address these issues as we propose (i) a drift-analysis framework to identify which characteristics of data are causing the drift, and (ii) SVM-CB, a time-aware classifier that leverages the drift-analysis information to slow down the performance drop. We highlight the efficacy of our contribution by comparing its degradation over time with a state-of-the-art classifier, and we show that SVM-CB better withstands the distribution changes that naturally characterize the malware domain. We conclude by discussing the limitations of our approach and how our contribution can be taken as a first step towards more time-resistant classifiers that not only tackle, but also understand the concept drift that affects data.
翻译:Android 恶意软件的存在和持续存在是困扰这个信息时代的持续威胁,而机器学习技术现在被广泛用来部署更有效的探测器,可以阻止大多数恶意程序。然而,这些算法尚未发展到追求恶意软件的自然演化,其性能也因这种概念驱动而随着时间的推移大大退化。目前,最先进的技术只是侧重于发现这种漂移的存在,或依靠经常更新模型来解决这一问题。因此,缺乏关于概念漂移的原因的知识,以及能够扭转时间流逝的临时性解决办法仍然没有得到充分调查。在这项工作中,我们开始解决这些问题,因为我们建议(一) 建立一个漂移分析框架,以确定哪些数据特征正在造成漂移,(二) SVM-CB,一个具有时间意识的分类器,利用流分析信息来减缓性下降速度。我们强调我们的贡献的效力,通过将它随着时间的流逝而退化与一个状态的分类器比较,而能够抵消时间流逝,而临时解决办法仍然被充分使用。我们开始解决这些问题,因为我们建议(一) 漂移分析框架框架,以确定我们如何更好地控制流动的方法,从而更好地理解我们如何控制流化数据流化方法。