GitHub's Copilot generates code quickly. We investigate whether it generates good code. Our approach is to identify a set of problems, ask Copilot to generate solutions, and attempt to formally verify these solutions with Dafny. Our formal verification is with respect to hand-crafted specifications. We have carried out this process on 6 problems and succeeded in formally verifying 4 of the created solutions. We found evidence which corroborates the current consensus in the literature: Copilot is a powerful tool; however, it should not be "flying the plane" by itself.
翻译:GitHub的副驾驶迅速生成代码。 我们调查它是否生成了好代码。 我们的方法是找出一系列问题, 请求副驾驶提出解决方案, 并尝试与Dafny正式核实这些解决方案。 我们的正式核查是针对手工制作的规格。 我们已经对6个问题进行了这个过程, 并成功地正式核实了4个创建的解决方案。 我们找到的证据证实了文献中目前的共识: 副驾驶是一个强大的工具; 但是, 它不应该是“ 自己飞飞机 ” 。