Given a sentence "Abby told Brittney that she upset Courtney", one would struggle to understand who "she" refers to, and ask for clarification. However, if the word "upset" were replaced with "hugged", "she" unambiguously refers to Abby. We study if modern coreference resolution models are sensitive to such pronominal ambiguity. To this end, we construct AmbiCoref, a diagnostic corpus of minimal sentence pairs with ambiguous and unambiguous referents. Our examples generalize psycholinguistic studies of human perception of ambiguity around particular arrangements of verbs and their arguments. Analysis shows that (1) humans are less sure of referents in ambiguous AmbiCoref examples than unambiguous ones, and (2) most coreference models show little difference in output between ambiguous and unambiguous pairs. We release AmbiCoref as a diagnostic corpus for testing whether models treat ambiguity similarly to humans.
翻译:根据一句“Abby告诉Brittney,她让Courtney心烦意乱”,人们将很难理解“她”指的是谁,并要求澄清。然而,如果将“她”一词替换为“Huged”,“She”毫不含糊地指艾比。我们研究现代共同参考分辨率模型是否敏感于这种明晰的模棱两可之处。我们为此建造了AmbiCoref,这是一套诊断性的最低刑罚材料,配有模糊和毫不含糊的参考材料。我们举例将人类对特定动词安排及其论据的模糊性心理语言学研究概括化。分析表明:(1) 人类在模糊的AmbiCoref 示例中的参考人比明确的例子更不那么肯定,(2) 多数共同参照模型在模范之间产出上几乎没有什么区别。我们把AmbiCoref作为诊断性材料,用于测试模型是否与人类相似地对待模糊性。