Performance debugging in production is a fundamental activity in modern service-based systems. The diagnosis of performance issues is often time-consuming, since it requires thorough inspection of large volumes of traces and performance indices. In this paper we present DeLag, a novel automated search-based approach for diagnosing performance issues in service-based systems. DeLag identifies subsets of requests that show, in the combination of their Remote Procedure Call execution times, symptoms of potentially relevant performance issues. We call such symptoms Latency Degradation Patterns. DeLag simultaneously searches for multiple latency degradation patterns while optimizing precision, recall and latency dissimilarity. Experimentation on 700 datasets of requests generated from two microservice-based systems shows that our approach provides better and more stable effectiveness than three state-of-the-art approaches and general purpose machine learning clustering algorithms. Moreover, DeLag outperforms in terms of efficiency the second and the third most effective baseline techniques on the largest datasets used in our evaluation.
翻译:生产中的性能调试是现代服务系统中的一项基本活动。对性能问题的诊断往往耗费时间,因为它要求彻底检查大量的痕量和性能指数。在本文中,我们介绍DeLag,这是用于诊断服务系统中性能问题的一种新型自动搜索方法。DeLag确定了请求的子集,这些请求结合其远程程序呼叫执行时间,显示了潜在相关性能问题的症状。我们称这些症状为长期性退化模式。DLag同时搜索多种长期性退化模式,同时优化精确度、回溯性和耐性差异性。对两个微观服务系统产生的700套请求进行实验表明,我们的方法比三种最先进的方法和通用机器学习组合算法提供了更好、更稳定的效益。此外,DeLag在效率方面超越了我们评估中使用的最大数据集的第二个和第三个最有效的基线技术。