Implicit discourse relations bind smaller linguistic units into coherent texts. Automatic sense prediction for implicit relations is hard, because it requires understanding the semantics of the linked arguments. Furthermore, annotated datasets contain relatively few labeled examples, due to the scale of the phenomenon: on average each discourse relation encompasses several dozen words. In this paper, we explore the utility of pre-trained sentence embeddings as base representations in a neural network for implicit discourse relation sense classification. We present a series of experiments using both supervised end-to-end trained models and pre-trained sentence encoding techniques - SkipThought, Sent2vec and Infersent. The pre-trained embeddings are competitive with the end-to-end model, and the approaches are complementary, with combined models yielding significant performance improvements on two of the three evaluations.
翻译:隐性话语关系将较小的语言单位捆绑在连贯的文本中。 对隐性关系的自动感知预测是困难的,因为它需要理解相关论点的语义。此外,由于这一现象的规模,附加说明的数据集包含的标签例子相对较少:平均而言,每个话语关系包含几十个字。在本文中,我们探索了将预先训练的句子嵌入神经网络作为基础表述的效用,以进行隐性话语关系感的分类。我们提出了一系列实验,同时使用了监督的终端到终端培训模式和经过训练的句子编码技术-SkippThought、Sent2vec和Inferent。预先训练的嵌入与端到端模式具有竞争力,这些方法是互补的,在三种评估中的两种中,合并模型产生了显著的业绩改进。