Failures with different root causes can disturb multi-fault localization significantly, therefore, dividing failures into distinct groups according to the responsible faults is highly important. In such a failure indexing task, the crux lies in the failure proximity, which involves two points, i.e., how to effectively represent failures (e.g., extract the signature of failures) and how to properly measure the distance between the proxies for those failures. Existing studies have proposed a variety of failure proximities. The prevalent of them extract signatures of failures from execution coverage or suspiciousness ranking lists, and accordingly employ the Euclid or the Kendall tau distances. However, such strategies may not properly reflect the essential characteristics of failures, thus resulting in unsatisfactory effectiveness. In this paper, we propose a new failure proximity, namely, program variable-based failure proximity, and based on which present a novel failure indexing approach. Specifically, the proposed approach utilizes the run-time values of program variables to represent failures, and designs a set of rules to measure the similarity between them. Experimental results demonstrate the competitiveness of the proposed approach: it can achieve 44.12% and 27.59% improvements in faults number estimation, as well as 47.30% and 26.93% improvements in clustering effectiveness, compared with the state-of-the-art technique in this field, in simulated and real-world environments, respectively.
翻译:具有不同根源的失败会大大扰乱多错本地化,因此,根据责任错误将失败分为不同的组别非常重要。在这样的失败指数化任务中,关键在于失败接近性,这涉及两个点,即如何有效地代表失败(例如,提取失败的签名),以及如何恰当地衡量这些失败的代理者之间的距离。现有研究提出了各种失败相似性。它们中普遍存在的从执行覆盖面或可疑程度排名列表中提取失败的特征,并因此使用Euclid或Kendall Tau距离。但是,在这样的失败指数化任务中,这些战略也许不能恰当地反映失败的基本特征,从而导致效果不令人满意。在这份文件中,我们提出了一个新的失败接近性,即基于方案的不同失败的接近性,并在此基础上提出了新的失败指数化方法。具体地说,拟议的方法利用程序变量的运行时间值来代表失败,并设计一套规则来衡量它们之间的相似性。实验结果表明拟议方法的竞争力:它可以实现44.12%和27.3%的实地的改进率,在实际技术上可以分别实现44.12%和27.3%的实地的改进率。