Modern systems produce a large volume of logs to record run-time status and events. System operators use these raw logs to track a system in order to obtain some useful information to diagnose system anomalies. One of the most important problems in this area is to help operators find the answers to log-based questions efficiently and user-friendly. In this work, we propose LogQA, which aims at answering log-based questions in the form of natural language based on large-scale unstructured log corpora. Our system presents the answer to a question directly instead of returning a list of relevant snippets, thus offering better user-friendliness and efficiency. LogQA represents the first approach to solve question answering in lod domain. LogQA has two key components: Log Retriever and Log Reader. Log Retriever aims at retrieving relevant logs w.r.t. a given question, while Log Reader is responsible for inferring the final answer. Given the lack of a public dataset for log questing answering, we manually labelled a QA dataset of three open-source log corpus and will make them publicly available. We evaluated our proposed model on these datasets by comparing its performance with 6 other baseline methods. Our experimental results demonstrate that LogQA has outperformed other baseline methods.
翻译:现代系统会产生大量的日志来记录运行状态和事件。系统操作员使用这些原始日志来跟踪系统以获得一些有用的信息来诊断系统异常。在这个领域最重要的问题之一是如何帮助操作员高效、用户友好地回答基于日志的问题。在本文中,我们提出了 LogQA,旨在根据大规模非结构化日志语料库回答自然语言形式的基于日志的问题。我们的系统直接呈现问题的答案,而不是返回一系列相关片段,因此提供更好的用户友好性和效率。LogQA是解决日志领域问题的第一个方法。LogQA有两个关键部分:日志检索器和日志阅读器。日志检索器旨在检索与给定问题相关的日志,而日志阅读器则负责推断最终的答案。由于缺乏用于日志问答的公共数据集,我们手动标注了三个开源日志语料库的QA数据集,并将它们公开。我们通过与其他6个基线方法的性能比较来评估我们提出的模型在这些数据集上的表现。我们的实验结果表明,LogQA的性能优于其他基线方法。