Detecting and reacting to unauthorized actions is an essential task in security monitoring. What make this task challenging are the large number and various categories of hosts and processes to monitor. To these we should add the lack of an exact definition of normal behavior for each category. Host profiling using stream clustering algorithms is an effective means of analyzing hosts' behaviors, categorizing them, and identifying atypical ones. However, unforeseen changes in behavioral data (i.e. concept drift) make the obtained profiles unreliable. DenStream is a well-known stream clustering algorithm, which can be effectively used for host profiling. This algorithm is an incremental extension of DBSCAN which is a non-parametric algorithm widely used in real-world clustering applications. Recent experimental studies indicate that DenStream is not robust against concept drift. In this paper, we present DenDrift as a drift-aware host profiling algorithm based on DenStream. DenDrift relies on non-negative matrix factorization for dimensionality reduction and Page-Hinckley test for drift detection. We have done experiments on both synthetic and industrial datasets and the results affirm the robustness of DenDrift against abrupt, gradual and incremental drifts.
翻译:检测和应对未经授权的行动是安全监测的一项基本任务。 使这项任务具有挑战性的是大量和各种类型的主机和进程需要监测。 对于这些任务,我们应该加上缺乏每个类别正常行为的确切定义。 使用流式群集算法进行主机特征分析是分析主机行为、对其进行分类和确定非典型行为的有效手段。 然而,行为数据的意外变化(即概念漂移)使得获得的轮廓不可靠。 DenStream是一个众所周知的流集算法,可以有效地用于主机特征分析。 DBSCAN的算法是DBSCAN的递增扩展,这是在现实世界集群应用中广泛使用的非参数算法。 最近的实验研究表明,DenStream对概念漂移不强。 在本文中,我们将DenDrift作为流式主机特征分析算法根据DenStream 进行, 登德利夫特依靠非负式矩阵要素化来减少水分量,而页式- Hincley 测试可以有效地用于进行流动探测。 我们已经在合成和工业数据配置上进行了实验, 和递增性流式数据系统上证实。