Fully supervised log anomaly detection methods require a lot of labeled data to achieve promising performance. Thus, how to alleviate the heavy burden of annotating massive unlabeled log data has received much attention. Recently, many semi-supervised log anomaly detection methods have been proposed to reduce the annotation costs with the help of templates parsed from labeled normal data. However, these methods usually consider each keyword independently, which disregard the correlation among keywords in log events and the contextual relationships among log sequences. In this paper, we propose a novel weakly supervised log anomaly detection framework, named LogLG, to explore the semantic connections among keywords from sequences. Specifically, we design an iterative process, where the keywords of unlabeled logs are first extracted to construct a log-event graph in each iteration. Then, we build a subgraph annotator to alter the purpose of generating pseudo labels for unlabeled log sequences into annotating corresponding log-subgraphs. To ameliorate the annotation quality, we adopt a self-supervised task to pre-train a subgraph annotator. After that, a log anomaly detection model is trained with the pseudo labels generated by the subgraph annotator. Conditioned on the classification results, we re-extract the keywords from the classified log sequences and update the log-event graph for the next iteration. Experiments on five benchmarks validate the effectiveness of LogLG for detecting anomalies on unlabeled log data, and demonstrate that LogLG, as the state-of-the-art weakly supervised method, achieves significant improvements compared to existing semi-supervised methods.
翻译:完全监督的日志异常检测方法需要大量标签数据才能实现有希望的性能。 因此, 如何减轻大量未标记的日志数据说明的沉重负担引起了人们的极大关注。 最近, 许多半监督的日志异常检测方法在模板帮助下, 从标签的正常数据中解析了对日志异常检测成本的注释。 然而, 这些方法通常独立考虑每个关键字, 而不考虑日志事件中关键字和日志序列之间背景关系之间的相互关系。 在本文中, 我们提议了一个新颖的、 薄弱监督的日志异常检测框架, 名为LogLG, 以探索序列中关键字之间的语义连接。 具体地说, 我们设计了一个迭代代程序, 在那里, 未标记的日志异常点检测方法首先在每次迭代号中构造一个日志。 然后, 我们建立一个子绘图的日志解解解解解解码, 用来在目前对日志的日志进行快速校正的日志更新。 之后, 我们用一个模拟的日志, 解解解解的日志, 解解的日志, 解的日志的日志, 的日志, 正在对它进行对现有的日志的日志的日志的日志, 的日志, 的日志, 更新。