In recent years, with the growth of online services and IoT devices, software log anomaly detection has become a significant concern for both academia and industry. However, at the time of writing this paper, almost all contributions to the log anomaly detection task, follow the same traditional architecture based on parsing, vectorizing, and classifying. This paper proposes OneLog, a new approach that uses a large deep model based on instead of multiple small components. OneLog utilizes a character-based convolutional neural network (CNN) originating from traditional NLP tasks. This allows the model to take advantage of multiple datasets at once and take advantage of numbers and punctuations, which were removed in previous architectures. We evaluate OneLog using four open data sets Hadoop Distributed File System (HDFS), BlueGene/L (BGL), Hadoop, and OpenStack. We evaluate our model with single and multi-project datasets. Additionally, we evaluate robustness with synthetically evolved datasets and ahead-of-time anomaly detection test that indicates capabilities to predict anomalies before occurring. To the best of our knowledge, our multi-project model outperforms state-of-the-art methods in HDFS, Hadoop, and BGL datasets, respectively setting getting F1 scores of 99.99, 99.99, and 99.98. However, OneLog's performance on the Openstack is unsatisfying with F1 score of only 21.18. Furthermore, Onelogs performance suffers very little from noise showing F1 scores of 99.95, 99.92, and 99.98 in HDFS, Hadoop, and BGL.
翻译:近年来,随着在线服务和IOT装置的增长,软件记录异常现象的检测已成为学术界和行业都十分关切的一个重大问题,然而,在撰写本文件时,几乎所有对日志异常检测任务的贡献几乎都是对日志异常检测任务的贡献,都遵循基于解析、矢量化和分类的传统架构。本文提议了OneLog,这是使用一个大型深度模型而不是多个小组件的新方法。OneLog使用了源自传统NLP任务的基于字符的神经神经网络(CNN)。OneLog 使用一个基于99.99.98的神经网络(CNN),这使得该模型能够一次利用多个数据集,并利用在以往的架构中删除的数字和标注。我们用四套开放数据集Hatoop Dism、BeG/L(BGL)、Hadoopoop和Open Stack-DFS 来评估我们的模型。我们用单项和多项目数据集来评价我们的模型。此外,我们用合成进化数据集来评价了开放式数据集的坚固度和超前的异常检测测试,表明了H-99999 S(S) 之前的功能、BLDFDFDF的功能的功能的变变的功能, 和最精细的运行。