Modern software systems become too complex to be tested and validated. Detecting software partial failures in complex systems at runtime assist to handle software unintended behaviors, avoiding catastrophic software failures and improving software runtime availability. These detection techniques aim to find the manifestation of faults before they finally lead to unavoidable failures, thus supporting following runtime fault tolerant techniques. We review the state of the art articles and find that the content failures account for the majority of all kinds of software failures, but its detection methods are rarely studied. In this work, we propose a novel failure detection indicator based on the software runtime dynamic execution information for software content failures. The runtime information is recorded during software execution, then transformed to a measure named runtime entropy and finally fed into machine learning models. The machine learning models are built to classify the intended and unintended behaviors of the objected software systems. A series of controlled experiments on several open source projects are conducted to prove the feasibility of the method. We also evaluate the accuracy of machine learning models built in this work.
翻译:现代软件系统变得过于复杂,无法进行测试和验证。 检测运行时复杂系统中的软件部分故障有助于处理软件意外行为,避免灾难性软件故障,并改进软件运行时间的可用性。 这些检测技术的目的是在错误最终导致不可避免的故障之前找到缺陷的表现,从而支持运行时的容忍技术。 我们审查艺术文章的状况,发现内容失败是所有软件故障的大多数原因,但很少研究其检测方法。 在这项工作中,我们提议基于软件运行时动态执行信息的新故障检测指标,用于软件内容故障。运行时信息在软件运行期间记录下来,然后转换成一个名为运行时的设置,最后输入到机器学习模型。 机器学习模型的建立是为了对被反对的软件系统的预期和意外行为进行分类。 对几个开放源项目进行了一系列受控实验,以证明方法的可行性。 我们还评估了在这项工作中创建的机器学习模型的准确性。