Text style transfer is an important task in controllable language generation. Supervised approaches have pushed performance improvement on style-oriented rewriting such as formality conversion. However, challenges remain due to the scarcity of large-scale parallel data in many domains. While unsupervised approaches do not rely on annotated sentence pairs for each style, they are often plagued with instability issues such as mode collapse or quality degradation. To take advantage of both supervised and unsupervised paradigms and tackle the challenges, in this work, we propose a semi-supervised framework for text style transfer. First, the learning process is bootstrapped with supervision guided by automatically constructed pseudo-parallel pairs using lexical and semantic-based methods. Then the model learns from unlabeled data via reinforcement rewards. Specifically, we propose to improve the sequence-to-sequence policy gradient via stepwise reward optimization, providing fine-grained learning signals and stabilizing the reinforced learning process. Experimental results show that the proposed approach achieves state-of-the-art performance on multiple datasets, and produces effective generation with as minimal as 10\% of training data.
翻译:在可控制的语言生成中, 文本样式传输是一项重要任务。 受监督的方法推动了以风格为导向的重写( 如形式转换)的性能改进。 但是, 挑战依然存在, 原因是许多领域缺乏大规模平行数据。 虽然不受监督的方法并不依赖每个样式的附加说明的对句, 但是它们往往受到模式崩溃或质量退化等不稳定问题的困扰。 为了利用受监管和不受监管的模式, 并应对挑战, 我们在此工作中提议了一个半监督的文本样式转换框架 。 首先, 学习过程由自动建造的假对子对子( 使用词汇和语义法方法) 指导。 然后, 模型通过加固奖励从无标签的数据中学习。 具体地说, 我们提议通过逐步奖励优化来改进顺序到顺序的政策梯度, 提供精细的学习信号, 并稳定强化的学习进程。 实验结果显示, 拟议的方法在多个数据集上取得了最先进的业绩, 并生成了以10个培训数据为最低值的有效生成 。