We report on novel investigations into training models that make sentences concise. We define the task and show that it is different from related tasks such as summarization and simplification. For evaluation, we release two test sets, consisting of 2000 sentences each, that were annotated by two and five human annotators, respectively. We demonstrate that conciseness is a difficult task for which zero-shot setups with large neural language models often do not perform well. Given the limitations of these approaches, we propose a synthetic data generation method based on round-trip translations. Using this data to either train Transformers from scratch or fine-tune T5 models yields our strongest baselines that can be further improved by fine-tuning on an artificial conciseness dataset that we derived from multi-annotator machine translation test sets.
翻译:我们报告对使句子简洁的培训模式的新调查。我们界定了任务,并表明它不同于相关任务,例如总结和简化。为了评估,我们发放了两套测试套,每套由2000个句子组成,分别由2名和5名人类陪审员附加说明。我们证明,简洁是一项困难的任务,因为使用大型神经语言模型的零弹集往往不能很好地发挥作用。鉴于这些方法的局限性,我们建议采用基于圆形翻译的合成数据生成方法。利用这些数据来从零到精细T5模型来培训变异器或精细T5模型,可以产生我们最强大的基线,通过微调我们从多音速机器翻译成套测试中产生的人工简洁数据集,可以进一步改进。