A Bug Inducing Commit (BIC) is a commit that introduces a software bug into the codebase. Knowing the relevant BIC for a given bug can provide valuable information for debugging as well as bug triaging. However, existing BIC identification techniques are either too expensive (because they require the failing tests to be executed against previous versions for bisection) or inapplicable at the debugging time (because they require post hoc artefacts such as bug reports or bug fixes). We propose Fonte, an efficient and accurate BIC identification technique that only requires test coverage. Fonte combines Fault Localisation (FL) with BIC identification and ranks commits based on the suspiciousness of the code elements that they modified. Fonte reduces the search space of BICs using failure coverage as well as a filter that detects commits that are merely style changes. Our empirical evaluation using 130 real-world BICs shows that Fonte significantly outperforms state-of-the-art BIC identification techniques based on Information Retrieval as well as neural code embedding models, achieving at least 39% higher MRR. We also report that the ranking scores produced by Fonte can be used to perform weighted bisection, further reducing the cost of BIC identification. Finally, we apply Fonte to a large-scale industry project with over 10M lines of code, and show that it can rank the actual BIC within the top five commits for 87% of the studied real batch-testing failures, and save the BIC inspection cost by 32% on average.
翻译:错误导引器( BIC) 是一种在代码库中引入软件错误的承诺( BIC) 。 了解给定错误的相关 BIC 可以提供宝贵的信息进行调试和错误处理。 但是, 现有的 BIC 识别技术要么太昂贵( 因为它们要求对前版本的分解进行失败测试), 要么在调试时不适用( 因为需要错误报告或错误修正等临时人工制品) 。 我们提议Fontee, 这是一种高效和准确的 BIC 识别技术, 只需要测试范围。 Fonte 将错误本地化( FL) 与 BIC 识别和 级别结合起来, 可以基于他们修改的代码的代码的可疑性提供有价值的信息。 Fonte 减少了 BIC 的搜索空间, 以及一个过滤器, 因为它们需要对前一个版本进行测试, 因为它们需要用130个真实世界的 BIC 来进行测试。 我们的经验评估Fonte 大大超出了基于信息Relieval 和神经代码嵌嵌入模型的状态, 达到至少39% MIC 。 我们还报告BIC 的排序, 用于BIC 10 的BIC 的高级测试, 最后显示了BIC 10 的升级的升级的排序, 。