With the increasing prevalence of scalable file systems in the context of High Performance Computing (HPC), the importance of accurate anomaly detection on runtime logs is increasing. But as it currently stands, many state-of-the-art methods for log-based anomaly detection, such as DeepLog, have encountered numerous challenges when applied to logs from many parallel file systems (PFSes), often due to their irregularity and ambiguity in time-based log sequences. To circumvent these problems, this study proposes ClusterLog, a log pre-processing method that clusters the temporal sequence of log keys based on their semantic similarity. By grouping semantically and sentimentally similar logs, this approach aims to represent log sequences with the smallest amount of unique log keys, intending to improve the ability of a downstream sequence-based model to effectively learn the log patterns. The preliminary results of ClusterLog indicate not only its effectiveness in reducing the granularity of log sequences without the loss of important sequence information but also its generalizability to different file systems' logs.
翻译:随着在高性能计算(HPC)背景下可缩放的文件系统越来越普遍,在运行时日志中准确检测异常现象的重要性正在增加。但是,按照目前的情况,许多基于日志的异常现象探测最先进的方法,如DeepLog,在应用于许多平行文件系统(PFSes)的日志时遇到许多挑战,这往往是由于其不规则性和基于时间的日志序列的模糊性。为回避这些问题,本研究提出了CromLog(CroupLog),这是一种根据日志的语义相似性对日志键的时间序列进行分组的预处理方法。按照语义和情感上的相似性对日志进行分组,这一方法的目的是用最小数量的独有日志键来代表日志序列,目的是提高下游序列模型有效学习日志模式的能力。ChockLog的初步结果表明,它不仅在不丢失重要序列信息的情况下减少日志序列的颗粒性,而且对不同文件系统日志具有一般性。