While coreference resolution is defined independently of dataset domain, most models for performing coreference resolution do not transfer well to unseen domains. We consolidate a set of 8 coreference resolution datasets targeting different domains to evaluate the off-the-shelf performance of models. We then mix three datasets for training; even though their domain, annotation guidelines, and metadata differ, we propose a method for jointly training a single model on this heterogeneous data mixture by using data augmentation to account for annotation differences and sampling to balance the data quantities. We find that in a zero-shot setting, models trained on a single dataset transfer poorly while joint training yields improved overall performance, leading to better generalization in coreference resolution models. This work contributes a new benchmark for robust coreference resolution and multiple new state-of-the-art results.
翻译:虽然共同参考分辨率的界定独立于数据集域,但大多数执行共同参考分辨率的模型并没有很好地向无形域转移。我们合并了一套针对不同领域的8个共同参考分辨率数据集,以评价模型的现成性能。然后我们混合了3个数据集用于培训;尽管它们的域、注释指南和元数据各不相同,但我们提议了一种方法,通过使用数据扩增来计算注释差异和抽样来平衡数据数量,共同培训单一数据集的模型,而联合培训则提高了总体性能,从而在共同参考分辨率模型中实现更好的通用化。这项工作有助于为强有力的共同参考分辨率和多种新状态结果制定新的基准。