Machine learning-based program analyses have recently shown the promise of integrating formal and probabilistic reasoning towards aiding software development. However, in the absence of large annotated corpora, training these analyses is challenging. Towards addressing this, we present BugLab, an approach for self-supervised learning of bug detection and repair. BugLab co-trains two models: (1) a detector model that learns to detect and repair bugs in code, (2) a selector model that learns to create buggy code for the detector to use as training data. A Python implementation of BugLab improves by up to 30% upon baseline methods on a test dataset of 2374 real-life bugs and finds 19 previously unknown bugs in open-source software.
翻译:机械学习程序分析最近显示出将正规和概率推理纳入协助软件开发的前景。 但是,在没有大型附加说明的公司的情况下,培训这些分析具有挑战性。 为了解决这个问题,我们提出BugLab, 一种自行监督的错误检测和修复学习方法。 BugLab 共对两种模型:(1) 一种在代码中学会检测和修复错误的检测模型,(2) 一种选择模型,学会为检测者创建错误代码,以便用作培训数据。 BugLab 的Python 应用在2374个实际存在虫的测试数据集中,根据基准方法,在开发源代码中发现19个先前未知的错误。