Logs generated by large-scale software systems provide crucial information for engineers to understand the system status and diagnose problems of the systems. Log parsing, which converts raw log messages into structured data, is the first step to enabling automated log analytics. Existing log parsers extract the common part as log templates using statistical features. However, these log parsers often fail to identify the correct templates and parameters because: 1) they often overlook the semantic meaning of log messages, and 2) they require domain-specific knowledge for different log datasets. To address the limitations of existing methods, in this paper, we propose LogPPT to capture the patterns of templates using prompt-based few-shot learning. LogPPT utilises a novel prompt tuning method to recognise keywords and parameters based on a few labelled log data. In addition, an adaptive random sampling algorithm is designed to select a small yet diverse training set. We have conducted extensive experiments on 16 public log datasets. The experimental results show that LogPPT is effective and efficient for log parsing.
翻译:大型软件系统生成的日志为工程师理解系统状态和诊断系统问题提供了关键信息。 将原始日志信息转换成结构化数据, 日志解析是促成自动日志分析的第一步。 现有的日志解析器利用统计特征将共同部分提取为日志模板。 然而, 这些日志解析器往往未能确定正确的模板和参数, 原因是:(1) 它们常常忽略日志信息的语义含义, 和(2) 它们要求不同日志数据集的域特定知识。 为了解决现有方法的局限性, 我们在本文件中建议LogPPT 使用快速的微小的解析算法来捕捉模板模式。 LogPPT 使用一种新的快速快速调制导法来识别基于少数标签日志数据的关键词和参数。 此外, 适应性随机抽样算法的设计旨在选择一个小型但多样的培训组。 我们已经对16个公共日志数据集进行了广泛的实验。 实验结果表明, LgPPT 用于对日志解的切图的效能和效率。