Creating an abridged version of a text involves shortening it while maintaining its linguistic qualities. In this paper, we examine this task from an NLP perspective for the first time. We present a new resource, AbLit, which is derived from abridged versions of English literature books. The dataset captures passage-level alignments between the original and abridged texts. We characterize the linguistic relations of these alignments, and create automated models to predict these relations as well as to generate abridgements for new texts. Our findings establish abridgement as a challenging task, motivating future resources and research. The dataset is available at github.com/roemmele/AbLit.
翻译:创建简略文本涉及在保持语言质量的同时缩短文本的篇幅。 在本文中, 我们第一次从国家语言方案的角度审视这项任务。 我们提出了一个新的资源, AbLit, 来源于英国文学书籍的简略版本。 数据集记录了原始文本和简略文本之间的通过水平的校正。 我们描述这些校正的语言关系, 并创建自动模型来预测这些关系, 并为新文本制作缩略语。 我们的发现将缩略语确定为一项具有挑战性的任务, 激发未来的资源和研究。 数据集可在 Github.com/roemmele/AbLit 上查阅 。