Resilience testing, which measures the ability to minimize service degradation caused by unexpected failures, is crucial for microservice systems. The current practice for resilience testing relies on manually defining rules for different microservice systems. Due to the diverse business logic of microservices, there are no one-size-fits-all microservice resilience testing rules. As the quantity and dynamic of microservices and failures largely increase, manual configuration exhibits its scalability and adaptivity issues. To overcome the two issues, we empirically compare the impacts of common failures in the resilient and unresilient deployments of a benchmark microservice system. Our study demonstrates that the resilient deployment can block the propagation of degradation from system performance metrics (e.g., memory usage) to business metrics (e.g., response latency). In this paper, we propose AVERT, the first AdaptiVE Resilience Testing framework for microservice systems. AVERT first injects failures into microservices and collects available monitoring metrics. Then AVERT ranks all the monitoring metrics according to their contributions to the overall service degradation caused by the injected failures. Lastly, AVERT produces a resilience index by how much the degradation in system performance metrics propagates to the degradation in business metrics. The higher the degradation propagation, the lower the resilience of the microservice system. We evaluate AVERT on two open-source benchmark microservice systems. The experimental results show that AVERT can accurately and efficiently test the resilience of microservice systems.
翻译:衡量因意外失灵而造成的服务退化最小化的能力的复原力测试,对于微观服务系统至关重要。目前的复原力测试做法依靠手工为不同的微观服务系统制定规则。由于微观服务的业务逻辑多种多样,没有一刀切的全微观服务复原力测试规则。随着微观服务的数量和动态和失败大幅增加,人工配置显示了其可缩放性和适应性问题。为了克服这两个问题,我们实证比较了在弹性和不弹性部署基准微观服务系统方面常见的失败的影响。我们的研究显示,弹性部署能够阻止退化的传播,而系统性能衡量标准(如记忆使用)与业务衡量标准(如反应耐久性)不同。在本文中,我们提出了AVERT,即第一个适应性复原力测试系统框架。AVERT首先将失败注入微观服务,并收集现有的监测指标。然后,AVERT根据它们对因注入性能失败而导致的总体服务退化作出的贡献,将所有监测性服务指标排在排序。最后,AVERT的弹性应用性能指数以高得多的测试系统为基准。AVERS的弹性评估系统向退化的升级。