We leverage a streaming architecture based on ELK, Spark and Hadoop in order to collect, store, and analyse database connection logs in near real-time. The proposed system investigates outliers using unsupervised learning; widely adopted clustering and classification algorithms for log data, highlighting the subtle variances in each model by visualisation of outliers. Arriving at a novel solution to evaluate untagged, unfiltered connection logs, we propose an approach that can be extrapolated to a generalised system of analysing connection logs across a large infrastructure comprising thousands of individual nodes and generating hundreds of lines in logs per second.
翻译:我们利用基于ELK、Spark和Hadoop的流体结构来收集、储存和分析近实时的数据库连接日志。 拟议的系统利用不受监督的学习来调查外部线;广泛采用对日志数据的分组和分类算法,通过外线的可视化来突出每个模型的细微差异。 我们提出了一个新颖的解决方案来评估未加标记的、未过滤的连接日志,我们建议了一种方法,可以推导为分析由数千个单个节点组成的大型基础设施连接日志的通用系统,每秒生成数百条日志。