基于日志的无日志解析异常探测 (Log-based Anomaly Detection Without Log Parsing)

Software systems often record important runtime information in system logs for troubleshooting purposes. There have been many studies that use log data to construct machine learning models for detecting system anomalies. Through our empirical study, we find that existing log-based anomaly detection approaches are significantly affected by log parsing errors that are introduced by 1) OOV (out-of-vocabulary) words, and 2) semantic misunderstandings. The log parsing errors could cause the loss of important information for anomaly detection. To address the limitations of existing methods, we propose NeuralLog, a novel log-based anomaly detection approach that does not require log parsing. NeuralLog extracts the semantic meaning of raw log messages and represents them as semantic vectors. These representation vectors are then used to detect anomalies through a Transformer-based classification model, which can capture the contextual information from log sequences. Our experimental results show that the proposed approach can effectively understand the semantic meaning of log messages and achieve accurate anomaly detection results. Overall, NeuralLog achieves F1-scores greater than 0.95 on four public datasets, outperforming the existing approaches.

翻译：在系统日志中记录重要的运行时间信息,以排除故障。许多研究都使用日志数据来构建机器学习模型,以发现系统异常。通过我们的经验研究,我们发现现有的日志异常检测方法受到下列错误的严重影响:1)OOOV(校外)单词和2)语义误解的日志解析错误。日志解析错误可能导致重要信息丢失,以便发现异常点。为了解决现有方法的局限性,我们提议 NeuralLog,这是基于日志的异常检测新颖方法,不需要对日志进行解析。NeuralLog提取原始日志信件的语义含义,并把它们作为语义矢量表示。然后,这些表达矢量被用于通过基于变异器的分类模型来检测异常点,该模型可以从日志序列中捕捉到相关的信息。我们的实验结果表明,拟议方法能够有效地理解日志信息在语义上的含义,并得出准确的异常点检测结果。总体而言,Neurallog在四个公共数据集上取得了超过0.95的F1分数。

相关内容

异常检测

关注 102

在数据挖掘中，异常检测（英语：anomaly detection）对不符合预期模式或数据集中其他项目的项目、事件或观测值的识别。通常异常项目会转变成银行欺诈、结构缺陷、医疗问题、文本错误等类型的问题。异常也被称为离群值、新奇、噪声、偏差和例外。特别是在检测滥用与网络入侵时，有趣性对象往往不是罕见对象，但却是超出预料的突发活动。这种模式不遵循通常统计定义中把异常点看作是罕见对象，于是许多异常检测方法（特别是无监督的方法）将对此类数据失效，除非进行了合适的聚集。相反，聚类分析算法可能可以检测出这些模式形成的微聚类。有三大类异常检测方法。[1] 在假设数据集中大多数实例都是正常的前提下，无监督异常检测方法能通过寻找与其他数据最不匹配的实例来检测出未标记测试数据的异常。监督式异常检测方法需要一个已经被标记“正常”与“异常”的数据集，并涉及到训练分类器（与许多其他的统计分类问题的关键区别是异常检测的内在不均衡性）。半监督式异常检测方法根据一个给定的正常训练数据集创建一个表示正常行为的模型，然后检测由学习模型生成的测试实例的可能性。

生成式对抗网络异常检测，GANs for Anomaly Detection

专知会员服务

34+阅读 · 2021年9月16日

【干货书】Python自然语言处理，504页pdf

专知会员服务

132+阅读 · 2021年6月18日

Fintech 2030：全球金融科技生态扫描, 218页pdf

专知会员服务

62+阅读 · 2021年6月11日

【干货书】Python程序员编程，810页pdf，Python® for Programmers

专知会员服务

62+阅读 · 2020年8月6日