Non-parallel text style transfer is an important task in natural language generation. However, previous studies concentrate on the token or sentence level, such as sentence sentiment and formality transfer, but neglect long style transfer at the discourse level. Long texts usually involve more complicated author linguistic preferences such as discourse structures than sentences. In this paper, we formulate the task of non-parallel story author-style transfer, which requires transferring an input story into a specified author style while maintaining source semantics. To tackle this problem, we propose a generation model, named StoryTrans, which leverages discourse representations to capture source content information and transfer them to target styles with learnable style embeddings. We use an additional training objective to disentangle stylistic features from the learned discourse representation to prevent the model from degenerating to an auto-encoder. Moreover, to enhance content preservation, we design a mask-and-fill framework to explicitly fuse style-specific keywords of source texts into generation. Furthermore, we constructed new datasets for this task in Chinese and English, respectively. Extensive experiments show that our model outperforms strong baselines in overall performance of style transfer and content preservation.
翻译:非平行文本样式的传输是自然语言生成中的一项重要任务。 但是, 先前的研究集中在象征性或句级层次上, 如句情和形式转移, 却忽略了话语层面的长式传输。 长文本通常涉及更复杂的作者语言偏好, 如话语结构比句子更复杂 。 在本文中, 我们制定非平行故事作者风格的传输任务, 需要将输入故事转移到指定的作者风格中, 并同时保持源语义学 。 为了解决这个问题, 我们提出了代代模式, 名为 Story Trans, 它将对话演示用于捕捉源内容信息, 并将它们传输到具有可学习风格嵌入的目标风格。 我们使用额外培训目标来解析学习的演示语言特征, 以防止模型退化为自动编码器。 此外, 我们设计了一个掩码和填充框架, 明确将源文本的样式关键词连接到下一代中。 此外, 我们用中文和英文分别构建了这项任务的新数据集。 广泛的实验显示, 我们的模型超越了风格传输和内容保存的总体性基准 。