Established cross-document coreference resolution (CDCR) datasets contain event-centric coreference chains of events and entities with identity relations. These datasets establish strict definitions of the coreference relations across related tests but typically ignore anaphora with more vague context-dependent loose coreference relations. In this paper, we qualitatively and quantitatively compare the annotation schemes of ECB+, a CDCR dataset with identity coreference relations, and NewsWCL50, a CDCR dataset with a mix of loose context-dependent and strict coreference relations. We propose a phrasing diversity metric (PD) that encounters for the diversity of full phrases unlike the previously proposed metrics and allows to evaluate lexical diversity of the CDCR datasets in a higher precision. The analysis shows that coreference chains of NewsWCL50 are more lexically diverse than those of ECB+ but annotating of NewsWCL50 leads to the lower inter-coder reliability. We discuss the different tasks that both CDCR datasets create for the CDCR models, i.e., lexical disambiguation and lexical diversity. Finally, to ensure generalizability of the CDCR models, we propose a direction for CDCR evaluation that combines CDCR datasets with multiple annotation schemes that focus of various properties of the coreference chains.
翻译:在本文件中,我们从质量和数量上比较了欧洲央行+的批注计划,这是一个具有身份关联关系的CDCR数据集,以及一个包含松散背景和严格关联关系组合的CDCR数据集。我们建议使用一个多样化指标,该指标会遇到与先前提议的衡量标准不同的完整短语的多样性,并能够更精确地评估CDCR数据集的词汇多样性。分析表明,NewsWCL50的共同参照链比欧洲央行+的顺序更加多样化,但对NewsWCL50的批注则导致不同代码的可靠性较低。我们讨论了CDCR数据集为CDCR模型(即,字典扭曲和字典多样性的分类模式)带来的不同任务。最后,分析表明,NewsWCWCL50的共同参照链比欧洲央行+的顺序更为不同,但对NewWCCL50的批注则导致不同代码的可靠性较低。我们讨论了CDCR数据集为CDCR模型创建的不同任务,即,即,词典的解析和字典多样性。最后,将CDCR的多级模式与CDCDCR的属性组合组合组合,即我们建议CDCDCR的通用的索引。