Sentence simplification aims at making the structure of text easier to read and understand while maintaining its original meaning. This can be helpful for people with disabilities, new language learners, or those with low literacy. Simplification often involves removing difficult words and rephrasing the sentence. Previous research have focused on tackling this task by either using external linguistic databases for simplification or by using control tokens for desired fine-tuning of sentences. However, in this paper we purely use pre-trained transformer models. We experiment with a combination of GPT-2 and BERT models, achieving the best SARI score of 46.80 on the Mechanical Turk dataset, which is significantly better than previous state-of-the-art results. The code can be found at https://github.com/amanbasu/sentence-simplification.
翻译:简化句子的目的是使文字结构更易于阅读和理解,同时保持其原意,对残疾人、新语言学习者或识字程度低的人可能有所帮助。简化往往涉及删除困难的词句和改写句子。以前的研究侧重于通过使用外部语言数据库简化或使用控制符号对句子进行所需的微调来应对这项任务。然而,在本文件中,我们纯粹使用训练前变压器模型。我们试验了GPT-2和BERT模型,在机械土耳其数据集上取得了46.80的SARI最佳分数,该分数大大优于以往的先进结果。该代码可以在https://github.com/amanbasu/sentence-simpligation上找到。