Logs, being run-time information automatically generated by software, record system events and activities with their timestamps. Before obtaining more insights about the run-time status of the software, a fundamental step of log analysis, called log parsing, is employed to extract structured templates and parameters from the semi-structured raw log messages. However, current log parsers regard each message as a character string, ignoring the semantic information included in parameters and templates. Thus, we propose the semantic parser SemParser to unlock the critical bottleneck of mining semantics from log messages. It contains two steps, an end-to-end semantic miner and a joint parser. Specifically, the first step aims to identify explicit semantics inside a single log, and the second step is responsible for jointly inferring implicit semantics and computing structural outputs based on the contextual knowledge base. To analyze the effectiveness of our semantic parser, we first demonstrate that it can derive rich semantics from log messages collected from seven widely-applied systems with an average F1 score of 0.987. Then, we conduct two representative downstream tasks, showing that current downstream techniques improve their performance with appropriately extracted semantics by 11.7% and 8.65% in anomaly detection and failure diagnosis tasks, respectively. We believe these findings provide insights into semantically understanding log messages for the log analysis community.
翻译:日志, 由软件自动生成运行时间信息, 记录系统事件和活动及其时间戳自动生成。 在获得更多关于软件运行时间状态的深入了解之前, 日志分析的基本步骤, 称为日志分析, 用于从半结构原始日志信息中提取结构化模板和参数。 然而, 当前日志分析者将每条信息视为字符串, 忽略参数和模板中包含的语义信息。 因此, 我们建议语义分析器 SemParser 从日志信息中解开采矿语义学的关键瓶颈。 它包含两个步骤, 包括一个从终端到终端的语义挖掘器和一个联合剖析器。 具体地说, 第一步旨在从一个半结构化的原始日志中找出明确的语义模板和参数。 第二步是共同推断隐含语义的字符串, 忽略参数和模板中包含的语义信息。 因此, 我们首先证明它可以从7个广泛应用的系统收集的日志信息中获取丰富的语义学内容。 它包含平均的 F1 至终端的语义挖掘探测器和一个联合读取器。 。 。 。 第一步, 我们进行两个具有代表性的路径分析任务, 分析结果分析, 分析, 分析 分析 分析 分析结果 分析 分析 分析 分析 分析 分析 分析