LogLG:通过日日志图建设进行微弱监督的日志异常探测 (LogLG: Weakly Supervised Log Anomaly Detection via Log-Event Graph Construction)

Fully supervised log anomaly detection methods require a lot of labeled data to achieve promising performance. Thus, how to alleviate the heavy burden of annotating massive unlabeled log data has received much attention. Recently, many semi-supervised log anomaly detection methods have been proposed to reduce the annotation costs with the help of templates parsed from labeled normal data. However, these methods usually consider each keyword independently, which disregard the correlation among keywords in log events and the contextual relationships among log sequences. In this paper, we propose a novel weakly supervised log anomaly detection framework, named LogLG, to explore the semantic connections among keywords from sequences. Specifically, we design an iterative process, where the keywords of unlabeled logs are first extracted to construct a log-event graph in each iteration. Then, we build a subgraph annotator to alter the purpose of generating pseudo labels for unlabeled log sequences into annotating corresponding log-subgraphs. To ameliorate the annotation quality, we adopt a self-supervised task to pre-train a subgraph annotator. After that, a log anomaly detection model is trained with the pseudo labels generated by the subgraph annotator. Conditioned on the classification results, we re-extract the keywords from the classified log sequences and update the log-event graph for the next iteration. Experiments on five benchmarks validate the effectiveness of LogLG for detecting anomalies on unlabeled log data, and demonstrate that LogLG, as the state-of-the-art weakly supervised method, achieves significant improvements compared to existing semi-supervised methods.

翻译：完全监督的日志异常检测方法需要大量标签数据才能实现有希望的性能。因此, 如何减轻大量未标记的日志数据说明的沉重负担引起了人们的极大关注。最近, 许多半监督的日志异常检测方法在模板帮助下, 从标签的正常数据中解析了对日志异常检测成本的注释。然而, 这些方法通常独立考虑每个关键字, 而不考虑日志事件中关键字和日志序列之间背景关系之间的相互关系。在本文中, 我们提议了一个新颖的、薄弱监督的日志异常检测框架, 名为LogLG, 以探索序列中关键字之间的语义连接。具体地说, 我们设计了一个迭代代程序, 在那里, 未标记的日志异常点检测方法首先在每次迭代号中构造一个日志。然后, 我们建立一个子绘图的日志解解解解解解码, 用来在目前对日志的日志进行快速校正的日志更新。之后, 我们用一个模拟的日志, 解解解解的日志, 解解的日志, 解的日志的日志, 的日志, 正在对它进行对现有的日志的日志的日志的日志, 的日志, 的日志, 更新。

相关内容

异常检测

关注 0

在数据挖掘中，异常检测（英语：anomaly detection）对不符合预期模式或数据集中其他项目的项目、事件或观测值的识别。通常异常项目会转变成银行欺诈、结构缺陷、医疗问题、文本错误等类型的问题。异常也被称为离群值、新奇、噪声、偏差和例外。特别是在检测滥用与网络入侵时，有趣性对象往往不是罕见对象，但却是超出预料的突发活动。这种模式不遵循通常统计定义中把异常点看作是罕见对象，于是许多异常检测方法（特别是无监督的方法）将对此类数据失效，除非进行了合适的聚集。相反，聚类分析算法可能可以检测出这些模式形成的微聚类。有三大类异常检测方法。[1] 在假设数据集中大多数实例都是正常的前提下，无监督异常检测方法能通过寻找与其他数据最不匹配的实例来检测出未标记测试数据的异常。监督式异常检测方法需要一个已经被标记“正常”与“异常”的数据集，并涉及到训练分类器（与许多其他的统计分类问题的关键区别是异常检测的内在不均衡性）。半监督式异常检测方法根据一个给定的正常训练数据集创建一个表示正常行为的模型，然后检测由学习模型生成的测试实例的可能性。

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日