Text revision refers to a family of natural language generation tasks, where the source and target sequences share moderate resemblance in surface form but differentiate in attributes, such as text formality and simplicity. Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems, which rely on large-scale parallel training corpus. In this paper, we present an iterative in-place editing approach for text revision, which requires no parallel data. In this approach, we simply fine-tune a pre-trained Transformer with masked language modeling and attribute classification. During inference, the editing at each iteration is realized by two-step span replacement. At the first step, the distributed representation of the text optimizes on the fly towards an attribute function. At the second step, a text span is masked and another new one is proposed conditioned on the optimized representation. The empirical experiments on two typical and important text revision tasks, text formalization and text simplification, show the effectiveness of our approach. It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification, and gains better performance than strong unsupervised methods on text formalization \footnote{Code and model are available at \url{https://github.com/jingjingli01/OREO}}.
翻译:文本修改是指自然语言生成任务的组合, 其源和目标序列在表面形式上具有中度相似性, 但在属性方面有区别, 例如文本形式和简洁。 当前最先进的方法将这些任务描述为顺序到顺序学习问题, 依赖大规模平行培训程序。 在本文中, 我们为文本修改提供一个迭接的现场编辑方法, 不需要平行数据 。 在这个方法中, 我们只需微调一个预先训练的变换器, 并配有掩码语言建模和属性分类 。 在推断中, 每种迭代的编辑都通过两步跨替换来实现。 在第一步, 分布式文本的表达方式优化到属性函数上。 在第二步, 文本覆盖被遮掩, 另一个新的则以优化的表达方式为条件。 在两个典型和重要的文本修改任务( 文本正规化和文本简化) 上的经验实验显示我们的方法的有效性。 在文本简化模型上, 实现的竞争性甚至更好的业绩比状态监督方法替换为两步。 在第一步, 文本简化中, 分布式 优化的文本表现优于未监督的成绩。 在未监督的文本格式/ 方向上, 扎实 。