The advance in machine learning (ML)-driven natural language process (NLP) points a promising direction for automatic bug fixing for software programs, as fixing a buggy program can be transformed to a translation task. While software programs contain much richer information than one-dimensional natural language documents, pioneering work on using ML-driven NLP techniques for automatic program repair only considered a limited set of such information. We hypothesize that more comprehensive information of software programs, if appropriately utilized, can improve the effectiveness of ML-driven NLP approaches in repairing software programs. As the first step towards proving this hypothesis, we propose a unified representation to capture the syntax, data flow, and control flow aspects of software programs, and devise a method to use such a representation to guide the transformer model from NLP in better understanding and fixing buggy programs. Our preliminary experiment confirms that the more comprehensive information of software programs used, the better ML-driven NLP techniques can perform in fixing bugs in these programs.
翻译:机器学习(ML)驱动的自然语言流程(NLP)的进步为软件程序自动错误修正提供了很有希望的方向,因为修补错误程序可以转换为翻译任务。虽然软件程序包含比单维自然语言文档更丰富的信息,但使用ML驱动的NLP自动程序修复技术的开创性工作仅认为是有限的一组信息。我们假设软件程序更加全面的信息,如果得到适当使用,可以提高ML驱动的NLP软件程序修复方法的效力。作为证明这一假设的第一步,我们提议统一代表,以捕捉软件程序的语法、数据流和控制流程方面,并设计一种方法,用这种代表来指导NLP的变压器模型更好地了解和修补错误程序。我们的初步实验证实,所使用的软件程序更为全面的信息,由ML驱动的更好的NLP技术可以在这些程序中修补错误方面表现得更好。