We explore the application of Information Retrieval (IR) based bug localization methods at a large industrial setting, Facebook. Facebook's code base evolves rapidly, with thousands of code changes being committed to a monolithic repository every day. When a bug is detected, it is often time-sensitive and imperative to identify the commit causing the bug in order to either revert it or fix it. This is complicated by the fact that bugs often manifest with complex and unwieldy features, such as stack traces and other metadata. Code commits also have various features associated with them, ranging from developer comments to test results. This poses unique challenges to bug localization methods, making it a highly non-trivial operation. In this paper we lay out several practical concerns for industry-level IR-based bug localization, and propose Bug2Commit, a tool that is designed to address these concerns. We also assess the effectiveness of existing IR-based localization techniques from the software engineering community, and find that in the presence of complex queries or documents, which are common at Facebook, existing approaches do not perform as well as Bug2Commit. We evaluate Bug2Commit on three applications at Facebook: client-side crashes from the mobile app, server-side performance regressions, and mobile simulation tests for performance. We find that Bug2Commit outperforms the accuracy of existing approaches by up to 17%, leading to reduced time for triaging regressions and attributing bugs found in simulations.
翻译:我们探索在大型工业环境中应用基于信息检索的错误本地化方法(IR) 。 Facebook 的代码基础会迅速演变, 每天有数千个代码修改被投入一个单一的存储器。 当检测到一个错误时, 我们往往需要时间敏感和紧迫的时间来辨别导致错误的操作, 以便恢复它或修复它。 由于错误通常以复杂和不易操作的功能, 比如堆积痕迹和其他元数据等, 这一点更加复杂。 代码承诺也具有与其相关的各种特征, 从开发者评论到测试结果。 这给错误本地化方法带来了独特的挑战, 使得它成为高度非三重操作。 在本文中, 我们为行业一级基于 IR 的错误本地化定位提出了一些实际问题, 并提出了解决这些关切的工具 。 我们还评估了软件工程界现有的基于 IR 的本地化技术的有效性, 并且发现在Facebook 常见的复杂查询或文件中, 现有方法不会像 Bug2 Committ 那样运行 BAR2 。 我们评估了行业一级服务器上的三个服务器的性能测试程序 。