Logs provide valuable insights into system runtime and assist in software development and maintenance. Log parsing, which converts semi-structured log data into structured log data, is often the first step in automated log analysis. Given the wide range of log parsers utilizing diverse techniques, it is essential to evaluate them to understand their characteristics and performance. In this paper, we conduct a comprehensive empirical study comparing syntax- and semantic-based log parsers, as well as single-phase and two-phase parsing architectures. Our experiments reveal that semantic-based methods perform better at identifying the correct templates and syntax-based log parsers are 10 to 1,000 times more efficient and provide better grouping accuracy although they fall short in accurate template identification. Moreover, two-phase architecture consistently improves accuracy compared to single-phase architecture. Based on the findings of this study, we propose SynLog+, a template identification module that acts as the second phase in a two-phase log parsing architecture. SynLog+ improves the parsing accuracy of syntax-based and semantic-based log parsers by 236\% and 20\% on average, respectively, with virtually no additional runtime cost.
 翻译:暂无翻译