Coreference Resolution is an important NLP task and most state-of-the-art methods rely on word embeddings for word representation. However, one issue that has been largely overlooked in literature is that of comparing the performance of different embeddings across and within families in this task. Therefore, we frame our study in the context of Event and Entity Coreference Resolution (EvCR & EnCR), and address two questions : 1) Is there a trade-off between performance (predictive & run-time) and embedding size? 2) How do the embeddings' performance compare within and across families? Our experiments reveal several interesting findings. First, we observe diminishing returns in performance with respect to embedding size. E.g. a model using solely a character embedding achieves 86% of the performance of the largest model (Elmo, GloVe, Character) while being 1.2% of its size. Second, the larger model using multiple embeddings learns faster overall despite being slower per epoch. However, it is still slower at test time. Finally, Elmo performs best on both EvCR and EnCR, while GloVe and FastText perform best in EvCR and EnCR respectively.
翻译:引用分辨率是一项重要的 NLP 任务, 且大多数最先进的方法都依赖于单词嵌入方式。 然而, 文献中基本上忽视的一个问题是比较不同家庭内部和内部嵌入方式在这项任务中的绩效。 因此, 我们根据事件和实体合作分辨率( EvCR & EnCR) 来设定我们的研究框架, 并解决两个问题 :(1) 性能( 预测和运行时间) 和嵌入规模之间是否有一种权衡? 2 嵌入方式在家庭内部和跨家庭之间如何进行比较? 我们的实验揭示了若干有趣的发现。 首先, 我们观察到在嵌入规模方面, 性能回报减少。 例如, 仅使用字符嵌入方式的模型在最大模型( Elmo, GloVe, 字符) 的性能达到86%, 而其规模为1. 2% 。 其次, 使用多个嵌入方式的大型模型在总体上学习得更快, 尽管每近一点。 然而, 测试时间仍然比较慢一些。 最后, Elmo 在 EvCR和 EnCR 分别表现最佳。