Microservice resilience, the ability of microservices to recover from failures and continue providing reliable and responsive services, is crucial for cloud vendors. However, the current practice relies on manually configured rules specific to a certain microservice system, resulting in labor-intensity and flexibility issues, given the large scale and high dynamics of microservices. A more labor-efficient and versatile solution is desired. Our insight is that resilient deployment can effectively prevent the dissemination of degradation from system performance metrics to user-aware metrics, and the latter directly affects service quality. In other words, failures in a non-resilient deployment can impact both types of metrics, leading to user dissatisfaction. With this in mind, we propose MicroRes, the first versatile resilience profiling framework for microservices via degradation dissemination indexing. MicroRes first injects failures into microservices and collects available monitoring metrics. Then, it ranks the metrics according to their contributions to the overall service degradation. It produces a resilience index by how much the degradation is disseminated from system performance metrics to user-aware metrics. Higher degradation dissemination indicates lower resilience. We evaluate MicroRes on two open-source and one industrial microservice system. The experiments show MicroRes' efficient and effective resilience profiling of microservices. We also showcase MicroRes' practical usage in production.
翻译:暂无翻译