For microservice applications with detected performance anomalies, localizing root causes based on monitoring data is important to enabling rapid recovery and loss mitigation. Existing research mainly focuses on coarse-grained faulty service localization. However, the fine-grained root cause localization to identify not only faulty service but also the root cause metric in the service is more helpful for operators to fix application anomalies, which is also more challenging. Recently, causal inference (CI) based methods is becoming popular but currently used CI methods have limitations, such as linear causal relations assumption. Therefore, this paper provides a framework named CausalRCA to implement fine-grained, automated, and real-time root cause localization. The CausalRCA works with a gradient-based causal structure learning method to generate weighted causal graphs and a root cause inference method to localize root cause metrics. We conduct coarse-grained and fine-grained root cause localization to validate the localization performance of CausalRCA. Experimental results show that CausalRCA performs best localization accuracy compared with baseline methods, e.g., the average $AC@3$ of the fine-grained root cause metric localization in the faulty service is 0.719, and the average improvement is 17\% compared with baseline methods.
翻译:对于检测到性能异常的微服务应用,根据监测数据对根源进行本地化对于迅速恢复和减少损失非常重要。现有研究主要侧重于粗粗的有缺陷的服务本地化。然而,细细的根化根化根化根化根化根化根化根化根化度,不仅有助于操作者确定有缺陷的服务,而且还有助于从根本上测量根化根性异常,这也更具挑战性。最近,基于因果推断(CI)的方法越来越受欢迎,但目前使用的CI方法也有局限性,如线性因果关系假设。因此,本文提供了一个名为CausalRCA的框架,以实施精细的、自动的和实时的根根化本化根化源化。CausalRCA用基于梯度的因果性结构学习方法,以生成加权因果因果性图和根性指数化根基化根基化根基方法。我们使用粗化和精细的根根化根化根化根化根化方法,以验证Causal-caal-因果关系假设。因此,CausalRCA 提供了一个框架化精准度比基准方法的最佳本地化精准性、自动、实时和实时根根根根化精化精化精化精化精化方法,例如、平均基改进法化根基化根基化法化法化法和17度改进法和基化法正正基化法。根基化法的根基化法的根基化法是17。基准法和基化法的精制比。