While most neural generative models generate outputs in a single pass, the human creative process is usually one of iterative building and refinement. Recent work has proposed models of editing processes, but these mostly focus on editing sequential data and/or only model a single editing pass. In this paper, we present a generic model for incremental editing of structured data (i.e., "structural edits"). Particularly, we focus on tree-structured data, taking abstract syntax trees of computer programs as our canonical example. Our editor learns to iteratively generate tree edits (e.g., deleting or adding a subtree) and applies them to the partially edited data, thereby the entire editing process can be formulated as consecutive, incremental tree transformations. To show the unique benefits of modeling tree edits directly, we further propose a novel edit encoder for learning to represent edits, as well as an imitation learning method that allows the editor to be more robust. We evaluate our proposed editor on two source code edit datasets, where results show that, with the proposed edit encoder, our editor significantly improves accuracy over previous approaches that generate the edited program directly in one pass. Finally, we demonstrate that training our editor to imitate experts and correct its mistakes dynamically can further improve its performance.
翻译:虽然大多数神经基因变异模型都是通过一个传球生成产出,但人类的创造过程通常是迭接的构建和完善过程。最近的工作提出了编辑过程的模型,但这些模型主要侧重于编辑顺序数据和(或)仅仅模拟单一编辑通行证。在本文中,我们提出了一个结构化数据(即“结构编辑”)逐步编辑的通用模型。特别是,我们侧重于树结构化数据,以计算机程序的抽象合成词树为例。我们的编辑学会迭接生成树木编辑(例如,删除或添加一个子树)并将其应用到部分编辑的数据中,因此整个编辑过程可以作为连续、递增的树变换而形成。为了直接显示树的建模编辑的独特好处,我们进一步提议了一个新的编辑编码编码,以及一种模拟学习方法,使编辑能够更加强大。我们用两个源代码编辑数据集来评价我们提议的编辑,结果显示,随着拟议的编辑编码,我们的编辑过程可以大大改进以前的精确度,从而直接改进了我们的编辑程序。