Text generation rarely considers the control of lexical complexity, which limits its more comprehensive practical application. We introduce a novel task of lexical complexity controlled sentence generation, which aims at keywords to sentence generation with desired complexity levels. It has enormous potential in domains such as grade reading, language teaching and acquisition. The challenge of this task is to generate fluent sentences only using the words of given complexity levels. We propose a simple but effective approach for this task based on complexity embedding. Compared with potential solutions, our approach fuses the representations of the word complexity levels into the model to get better control of lexical complexity. And we demonstrate the feasibility of the approach for both training models from scratch and fine-tuning the pre-trained models. To facilitate the research, we develop two datasets in English and Chinese respectively, on which extensive experiments are conducted. Results show that our approach better controls lexical complexity and generates higher quality sentences than baseline methods.
翻译:文本生成很少考虑控制词汇复杂性,这限制了其更加全面的实际应用。我们引入了一种新颖的词汇复杂性控制句生成任务,目的是用关键字来生成具有理想复杂程度的句子。它在品级阅读、语言教学和获取等领域具有巨大的潜力。这项任务的艰巨任务是只用给定复杂程度的词句来生成流利的句子。我们提出了基于复杂性嵌入的简单而有效的方法。与潜在的解决方案相比,我们的方法将字性复杂性的表述结合到模型中,以更好地控制词汇复杂性。我们还展示了从零到微调培训模型和微调预培训模型两种方法的可行性。为了便利研究,我们分别用英文和中文开发了两个数据集,并进行了广泛的实验。结果显示,我们的方法更好地控制了词汇复杂性,并产生了比基线方法质量更高的句子。