Modern information and communication systems have become increasingly challenging to manage. The ubiquitous system logs contain plentiful information and are thus widely exploited as an alternative source for system management. As log files usually encompass large amounts of raw data, manually analyzing them is laborious and error-prone. Consequently, many research endeavors have been devoted to automatic log analysis. However, these works typically expect structured input and struggle with the heterogeneous nature of raw system logs. Log parsing closes this gap by converting the unstructured system logs to structured records. Many parsers were proposed during the last decades to accommodate various log analysis applications. However, due to the ample solution space and lack of systematic evaluation, it is not easy for practitioners to find ready-made solutions that fit their needs. This paper aims to provide a comprehensive survey on log parsing. We begin with an exhaustive taxonomy of existing log parsers. Then we empirically analyze the critical performance and operational features for 17 open-source solutions both quantitatively and qualitatively, and whenever applicable discuss the merits of alternative approaches. We also elaborate on future challenges and discuss the relevant research directions. We envision this survey as a helpful resource for system administrators and domain experts to choose the most desirable open-source solution or implement new ones based on application-specific requirements.
翻译:现代信息和通信系统越来越难以管理。无处不在的系统日志包含大量信息,因此被广泛用作系统管理的替代源头。由于日志文档通常包含大量原始数据,人工分析是困难和容易出错的。因此,许多研究工作都致力于自动日志分析。然而,这些工作通常期望有条不紊的投入和与原始系统日志的多样化性质作斗争。将非结构化系统日志转换成结构化记录,从而缩小这一差距。过去几十年,许多分析员被提议适应各种日志分析应用。然而,由于解决方案空间充足,缺乏系统评估,因此,从业者很难找到适合其需要的现成解决办法。本文旨在对日志分类进行综合调查。我们首先对现有日志采集者进行详尽的分类分析。然后,我们用经验分析17个开放源码和定性的系统的关键业绩和业务特点,然后酌情讨论替代方法的优点。我们还详细讨论未来的挑战,并讨论相关的研究方向。我们设想,由于缺少系统,因此,从业人员很难找到适合其需要的解决方案。我们设想,将这一调查作为有用的资源系统的基础,或者应用。