In this paper, we study the identity of textual events from different documents. While the complex nature of event identity is previously studied (Hovy et al., 2013), the case of events across documents is unclear. Prior work on cross-document event coreference has two main drawbacks. First, they restrict the annotations to a limited set of event types. Second, they insufficiently tackle the concept of event identity. Such annotation setup reduces the pool of event mentions and prevents one from considering the possibility of quasi-identity relations. We propose a dense annotation approach for cross-document event coreference, comprising a rich source of event mentions and a dense annotation effort between related document pairs. To this end, we design a new annotation workflow with careful quality control and an easy-to-use annotation interface. In addition to the links, we further collect overlapping event contexts, including time, location, and participants, to shed some light on the relation between identity decisions and context. We present an open-access dataset for cross-document event coreference, CDEC-WN, collected from English Wikinews and open-source our annotation toolkit to encourage further research on cross-document tasks.
翻译:在本文中,我们研究不同文件的文字事件的身份。虽然以前研究过事件身份的复杂性质(Hovy等人,2013年),但不同文件事件的情况并不明确。先前关于交叉文档事件共同参照的工作有两个主要缺点。首先,它们将说明限于有限的事件类型;其次,它们没有充分处理事件身份的概念。这种说明安排减少了事件提及的范围,并阻止人们考虑准身份关系的可能性。我们提出了交叉文件事件共同参照的密集说明方法,其中包括内容丰富的事件来源,以及相关文件对口之间密集的说明努力。为此,我们设计了新的说明工作流程,对质量进行仔细控制,并有一个易于使用的注释界面。除了这些联系外,我们还收集了重叠事件的背景,包括时间、地点和参与者,以了解身份决定和背景之间的关系。我们提出了一个公开查阅数据集,供交叉文件活动共同参照,CDEC-WN,从英国Wikinews收集,并公开链接我们的交叉研究工具包,鼓励进一步研究。