NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We examine existing long-text datasets, and handpick ones where the text is naturally long, while prioritizing tasks that involve synthesizing information across the input. SCROLLS contains summarization, question answering, and natural language inference tasks, covering multiple domains, including literature, science, business, and entertainment. Initial baselines, including Longformer Encoder-Decoder, indicate that there is ample room for improvement on SCROLLS. We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
翻译:尽管长篇案文包含大量自然语言,但长篇案文在很大程度上侧重于短文,如句子和段落等。我们引入了超文本类集,这是需要长篇案文推理的一套任务。我们研究了现有的长文本数据集,在文本自然长的方面手工挑选了那些数据集,同时安排了涉及综合各种输入信息的任务的优先次序。超文本类集包含概括、回答问题和自然语言推论任务,涵盖多个领域,包括文学、科学、商业和娱乐。包括古老的Encoder-Decoder在内的初始基线表明,对超文本类集有充分的改进空间。我们以统一的文本到文本格式提供所有数据集,并主持一个现场引导板,以便利对模型结构和培训前方法的研究。