Complex feature extractors are widely employed for text representation building. However, these complex feature extractors can lead to severe overfitting problems especially when the training datasets are small, which is especially the case for several discourse parsing tasks. Thus, we propose to remove additional feature extractors and only utilize self-attention mechanism to exploit pretrained neural language models in order to mitigate the overfitting problem. Experiments on three common discourse parsing tasks (News Discourse Profiling, Rhetorical Structure Theory based Discourse Parsing and Penn Discourse Treebank based Discourse Parsing) show that powered by recent pretrained language models, our simplied feature extractors obtain better generalizabilities and meanwhile achieve comparable or even better system performance. The simplified feature extractors have fewer learnable parameters and less processing time. Codes will be released and this simple yet effective model can serve as a better baseline for future research.
翻译:这些复杂的特征提取器被广泛用于文本代表制的建设,但是,这些复杂的特征提取器可能导致严重的超适应问题,特别是当培训数据集规模小时,这在几个讨论分析任务中尤其如此。因此,我们建议删除更多的特征提取器,并且只利用自我注意机制来利用经过训练的神经语言模型来缓解过于适应的问题。关于三种共同的谈话区分任务的实验(新分解、以神经结构为基础的分解和基于精密分解的分解和基于彭道库分流的树道分解)表明,在经过最近预先训练的语言模型的驱动下,我们精细的特征提取器获得了更好的通用性,同时实现了可比较甚至更好的系统性。简化的特征提取器的可学习参数较少,处理时间也较少。代码将被发布,这一简单而有效的模型可以作为未来研究的更好的基准。