Pretraining techniques leveraging enormous datasets have driven recent advances in text summarization. While folk explanations suggest that knowledge transfer accounts for pretraining's benefits, little is known about why it works or what makes a pretraining task or dataset suitable. In this paper, we challenge the knowledge transfer story, showing that pretraining on documents consisting of character n-grams selected at random, we can nearly match the performance of models pretrained on real corpora. This work holds the promise of eliminating upstream corpora, which may alleviate some concerns over offensive language, bias, and copyright issues. To see whether the small residual benefit of using real data could be accounted for by the structure of the pretraining task, we design several tasks motivated by a qualitative study of summarization corpora. However, these tasks confer no appreciable benefit, leaving open the possibility of a small role for knowledge transfer.
翻译:利用大量数据集的培训前技术推动了最近在文本总结方面的进步。虽然民间解释表明知识转移可以说明培训前的好处,但对知识转移为什么起作用或是什么使培训前任务或数据集适合的情况却知之甚少。在本文中,我们质疑知识转移的故事,表明对由随机选择的字符n克组成的文件进行预先培训,我们几乎可以与在真正的公司上预先培训的模型的性能相匹配。这项工作有望消除上游公司,这可能会减轻对攻击性语言、偏见和版权问题的一些关切。为了了解使用实际数据带来的小剩余利益能否在培训前任务的结构中得到反映,我们设计了若干任务,其动机是进行定性研究,对合成公司进行定性研究。然而,这些任务并没有带来明显的好处,因此为知识转让开辟了小型作用的可能性。