Long-sequence transformers are designed to improve the representation of longer texts by language models and their performance on downstream document-level tasks. However, not much is understood about the quality of token-level predictions in long-form models. We investigate the performance of such architectures in the context of document classification with unsupervised rationale extraction. We find standard soft attention methods to perform significantly worse when combined with the Longformer language model. We propose a compositional soft attention architecture that applies RoBERTa sentence-wise to extract plausible rationales at the token-level. We find this method to significantly outperform Longformer-driven baselines on sentiment classification datasets, while also exhibiting significantly lower runtimes.
翻译:长序列变压器的设计是为了改进按语言模式排列较长文本的表述及其在下游文件层面任务的绩效。 但是,对于长式模型中象征性水平预测的质量了解不多。 我们调查了这些结构在文件分类方面的性能,而没有监督的理由提取。 我们发现标准软关注方法在与长式语言模型结合时表现得要差得多。 我们提出了一个组成软关注结构,从语句的角度来应用罗贝塔语句来在象征性层面上提取合理的理由。 我们发现这种方法大大优于情绪分类数据集的长式驱动基线,同时显示运行时间要低得多。</s>