Fault localization has been determined as a major resource factor in the software development life cycle. Academic fault localization techniques are mostly unknown and unused in professional environments. Although manual debugging approaches can vary significantly depending on bug type (e.g. memory bugs or semantic bugs), these differences are not reflected in most existing fault localization tools. Little research has gone into automated identification of bug types to optimize the fault localization process. Further, existing fault localization techniques leverage on historical data only for augmentation of suspiciousness rankings. This thesis aims to provide a fault localization framework by combining data from various sources to help developers in the fault localization process. To achieve this, a bug classification schema is introduced, benchmarks are created, and a novel fault localization method based on historical data is proposed.
翻译:在软件开发生命周期中,已经确定错误本地化是一个重要的资源因素。学术错误本地化技术大多不为人知,在专业环境中也没有使用。尽管人工调试方法可能因错误类型而有很大差异(如记忆错误或语义错误),但这些差异没有反映在大多数现有的错误本地化工具中。对于自动识别错误类型以优化错误本地化过程的研究很少。此外,现有的错误本地化技术仅利用历史数据来增加可疑程度的排名。这一理论旨在提供错误本地化框架,将各种来源的数据合并起来,帮助开发者在错误本地化过程中。为此,引入了错误分类系统,建立了基准,并提出了基于历史数据的新的错误本地化方法。