Current work on automatic coreference resolution has focused on the OntoNotes benchmark dataset, due to both its size and consistency. However many aspects of the OntoNotes annotation scheme are not well understood by NLP practitioners, including the treatment of generic NPs, noun modifiers, indefinite anaphora, predication and more. These often lead to counterintuitive claims, results and system behaviors. This opinion piece aims to highlight some of the problems with the OntoNotes rendition of coreference, and to propose a way forward relying on three principles: 1. a focus on semantics, not morphosyntax; 2. cross-linguistic generalizability; and 3. a separation of identity and scope, which can resolve old problems involving temporal and modal domain consistency.
翻译:由于OntoNotes基准数据集的规模和一致性,目前关于自动参照决议的工作侧重于OntoNotes基准数据集,然而,国家劳工政策执行人员对OntoNotes说明办法的许多方面没有很好地理解,包括通用NPs、名词修饰者、无限期厌光、预言等等的处理,这往往会导致反直觉的主张、结果和系统行为。本意见文章旨在突出OntoNotes移出共同参照的一些问题,并依据三项原则提出前进的道路:1. 侧重于语义学,而不是形态学;2. 跨语言通用性;和3. 区分身份和范围,这可以解决涉及时间和模式域一致性的老问题。