深调: 使用堆叠踪迹、后译和代码矩形来修正 Python 错误 (DeepDebug: Fixing Python Bugs Using Stack Traces, Backtranslation, and Code Skeletons)

The joint task of bug localization and program repair is an integral part of the software development process. In this work we present DeepDebug, an approach to automated debugging using large, pretrained transformers. We begin by training a bug-creation model on reversed commit data for the purpose of generating synthetic bugs. We apply these synthetic bugs toward two ends. First, we directly train a backtranslation model on all functions from 200K repositories. Next, we focus on 10K repositories for which we can execute tests, and create buggy versions of all functions in those repositories that are covered by passing tests. This provides us with rich debugging information such as stack traces and print statements, which we use to finetune our model which was pretrained on raw source code. Finally, we strengthen all our models by expanding the context window beyond the buggy function itself, and adding a skeleton consisting of that function's parent class, imports, signatures, docstrings, and method bodies, in order of priority. On the QuixBugs benchmark, we increase the total number of fixes found by over 50%, while also decreasing the false positive rate from 35% to 5% and decreasing the timeout from six hours to one minute. On our own benchmark of executable tests, our model fixes 68% of all bugs on its first attempt without using traces, and after adding traces it fixes 75% on first attempt. We will open-source our framework and validation set for evaluating on executable tests.

翻译：错误本地化和程序修复联合任务是软件开发过程的一个组成部分。在此工作中, 我们展示了 DeepDebug, 这是一种使用大型、未经训练的变压器自动调试的方法。我们首先训练了用于反向承诺数据以生成合成错误的错误生成模型。我们将这些合成错误应用到两端。首先, 我们直接训练了200K 仓库所有函数的反译模型。其次, 我们侧重于10K 仓库, 我们可以进行测试, 并在那些通过测试的仓库中创建所有功能的打开版本。这为我们提供了大量调试信息, 如堆点痕迹和打印语句。我们用它来微调原始源代码预设的模型。最后, 我们加强所有模型, 将上下文窗口扩展到错误功能功能本身的两端。我们直接训练了一个包含该函数的父级、进口、签名、 docstring 和方法机体的后骨架, 优先排序。在 QuixBugs 基准中, 我们将增加第一个在50 % 以上所找到的校正的总数, 同时将缩小一次, 同时将降低所有正标数率率, 。