With the growing adoption of self-adaptive systems in various domains, there is an increasing need for strategies to assess their correct behavior. In particular self-healing systems, which aim to provide resilience and fault-tolerance, often deal with unanticipated failures in critical and highly dynamic environments. Their reactive and complex behavior makes it challenging to assess if these systems execute according to the desired goals. Recently, several studies have expressed concern about the lack of systematic evaluation methods for self-healing behavior. In this paper, we propose CHESS, an approach for the systematic evaluation of self-adaptive and self-healing systems that builds on chaos engineering. Chaos engineering is a methodology for subjecting a system to unexpected conditions and scenarios. It has shown great promise in helping developers build resilient microservice architectures and cyber-physical systems. CHESS turns this idea around by using chaos engineering to evaluate how well a self-healing system can withstand such perturbations. We investigate the viability of this approach through an exploratory study on a self-healing smart office environment. The study helps us explore the promises and limitations of the approach, as well as identify directions where additional work is needed. We conclude with a summary of lessons learned.
翻译:随着各个领域越来越多地采用自我适应系统,越来越需要各种战略来评估其正确行为。特别是旨在提供复原力和错失容忍的自我愈合系统,常常处理关键和高度动态环境中的意外故障。它们的被动和复杂行为使得评估这些系统是否按照预期目标运行成为挑战。最近,一些研究对自我愈合行为缺乏系统评价方法表示关切。在本文件中,我们提议CHESS,这是系统评价以混乱工程为基础的自我适应和自愈合系统的一种方法。Chaos工程是使一个系统适应出乎意料的条件和情景的一种方法。它在帮助开发者建立具有复原力的微观服务架构和网络物理系统方面显示出巨大的希望。CHESS利用混乱工程来评估一个自我愈合系统如何能承受这种扰动。我们通过对自愈合智能办公环境进行探索性研究来调查这一方法的可行性。这项研究帮助我们探索这一方法的许诺和局限性,并找出需要开展额外工作的方向。我们总结了经验教训。