Training coreference resolution models require comprehensively labeled data. A model trained on one dataset may not successfully transfer to new domains. This paper investigates an approach to active learning for coreference resolution that feeds discrete annotations to an incremental clustering model. The recent developments in incremental coreference resolution allow for a novel approach to active learning in this setting. Through this new framework, we analyze important factors in data acquisition, like sources of model uncertainty and balancing reading and labeling costs. We explore different settings through simulated labeling with gold data. By lowering the data barrier for coreference, coreference resolvers can rapidly adapt to a series of previously unconsidered domains.
翻译:培训共同参考分辨率模型需要全面的标签数据。 在一个数据集上培训的模型可能无法成功传输到新域。 本文件调查了一种积极学习共参考分辨率的方法,该方法将离散的注释输入到递增的集群模型中。 最近在递增共同参考分辨率方面的发展为在这一环境中积极学习提供了一种新的方法。 通过这一新框架,我们分析了数据获取中的重要因素,如模型不确定性的来源,平衡读数和标签成本。 我们通过模拟黄金数据标签来探索不同的环境。 通过降低共同参考的数据屏障,共同参考解决者可以迅速适应一系列先前未考虑的领域。