Software systems often record important runtime information in system logs for troubleshooting purposes. There have been many studies that use log data to construct machine learning models for detecting system anomalies. Through our empirical study, we find that existing log-based anomaly detection approaches are significantly affected by log parsing errors that are introduced by 1) OOV (out-of-vocabulary) words, and 2) semantic misunderstandings. The log parsing errors could cause the loss of important information for anomaly detection. To address the limitations of existing methods, we propose NeuralLog, a novel log-based anomaly detection approach that does not require log parsing. NeuralLog extracts the semantic meaning of raw log messages and represents them as semantic vectors. These representation vectors are then used to detect anomalies through a Transformer-based classification model, which can capture the contextual information from log sequences. Our experimental results show that the proposed approach can effectively understand the semantic meaning of log messages and achieve accurate anomaly detection results. Overall, NeuralLog achieves F1-scores greater than 0.95 on four public datasets, outperforming the existing approaches.
翻译:在系统日志中记录重要的运行时间信息,以排除故障。许多研究都使用日志数据来构建机器学习模型,以发现系统异常。通过我们的经验研究,我们发现现有的日志异常检测方法受到下列错误的严重影响:1)OOOV(校外)单词和2)语义误解的日志解析错误。日志解析错误可能导致重要信息丢失,以便发现异常点。为了解决现有方法的局限性,我们提议 NeuralLog,这是基于日志的异常检测新颖方法,不需要对日志进行解析。NeuralLog提取原始日志信件的语义含义,并把它们作为语义矢量表示。然后,这些表达矢量被用于通过基于变异器的分类模型来检测异常点,该模型可以从日志序列中捕捉到相关的信息。我们的实验结果表明,拟议方法能够有效地理解日志信息在语义上的含义,并得出准确的异常点检测结果。总体而言,Neurallog在四个公共数据集上取得了超过0.95的F1分数。