Machine programming (MP) is concerned with automating software development. According to studies, software engineers spend upwards of 50% of their development time debugging software. To help accelerate debugging, we present MP-CodeCheck (MPCC). MPCC is an MP system that attempts to identify anomalous code patterns within logical program expressions. In designing MPCC, we developed two novel programming language representations, the formations of which are critical in its ability to exhaustively and efficiently process the billions of lines of code that are used in its self-supervised training. To quantify MPCC's performance, we compare it against ControlFlag, a state-of-the-art self-supervised code anomaly detection system; we find that MPCC is more spatially and temporally efficient. We demonstrate MPCC's anomalous code detection capabilities by exercising it on a variety of open-source GitHub repositories and one proprietary code base. We also provide a brief qualitative study on some of the different classes of code anomalies that MPCC can detect to provide an abbreviated insight into its capabilities.
翻译:机器编程(MP)与软件开发自动化有关。 研究表明, 软件工程师花费了50%以上的开发时间调试软件。 为了帮助加速调试, 我们提出 MP- CodeCheck (MPCC) 。 MPCC 是一个MP 系统, 试图在逻辑程序表达式中辨别异常代码模式。 在设计 MPCC 时, 我们开发了两种新颖的编程语言表达方式, 其形成方式对于它全面、 高效地处理在自我监督的培训中使用的数十亿条代码的能力至关重要。 为了量化 MPCC 的性能, 我们将其与控制法拉格( 控制法拉格 ) 进行比较, 这是一种最先进的自我监督码异常探测系统; 我们发现 MPCC 是一个空间和时间效率更高的系统 。 我们通过在多种开源 GitHub 存储器和一个专有代码基础上运用它来显示其反常态代码检测能力。 我们还对MPCC 能够检测到的某几类代码异常特性进行了简要的质量研究。