Execution logs are a crucial medium as they record runtime information of software systems. Although extensive logs are helpful to provide valuable details to identify the root cause in postmortem analysis in case of a failure, this may also incur performance overhead and storage cost. Therefore, in this research, we present the result of our experimental study on seven Spark benchmarks to illustrate the impact of different logging verbosity levels on the execution time and storage cost of distributed software systems. We also evaluate the log effectiveness and the information gain values, and study the changes in performance and the generated logs for each benchmark with various types of distributed system failures. Our research draws insightful findings for developers and practitioners on how to set up and utilize their distributed systems to benefit from the execution logs.
翻译:执行日志是一个重要的媒介,因为它们记录了软件系统的运行时间信息。虽然大量日志有助于提供有价值的细节,以确定在出现故障时进行尸检分析的根本原因,但也可能产生性能管理费用和储存费用。因此,在这项研究中,我们介绍了关于七个火花基准的实验研究结果,以说明不同伐木动词水平对分布式软件系统执行时间和储存费用的影响。我们还评估日志的有效性和信息收益值,并研究各种分布式系统故障的每个基准的性能变化和生成日志。我们的研究为开发者和从业人员收集了关于如何建立和利用分布式系统以受益于执行日志的深刻发现。