We study learning named entity recognizers in the presence of missing entity annotations. We approach this setting as tagging with latent variables and propose a novel loss, the Expected Entity Ratio, to learn models in the presence of systematically missing tags. We show that our approach is both theoretically sound and empirically useful. Experimentally, we find that it meets or exceeds performance of strong and state-of-the-art baselines across a variety of languages, annotation scenarios, and amounts of labeled data. In particular, we find that it significantly outperforms the previous state-of-the-art methods from Mayhew et al. (2019) and Li et al. (2021) by +12.7 and +2.3 F1 score in a challenging setting with only 1,000 biased annotations, averaged across 7 datasets. We also show that, when combined with our approach, a novel sparse annotation scheme outperforms exhaustive annotation for modest annotation budgets.
翻译:在缺少实体说明的情况下,我们学习了命名实体识别器。我们将这一设置作为潜在变量的标记,并提出新的损失,即预期实体比率,以在系统缺失标签的情况下学习模型。我们显示,我们的方法在理论上是健全的,经验上是有用的。我们实验发现,它满足或超过各种语言、说明情景和标签数据数量方面最强和最先进的基线的性能。特别是,我们发现它大大优于梅休等人(2019年)和李等人(2021年)以前最先进的方法,在具有挑战性的设置中,以1 000个偏差的注解和平均分布在7个数据集中,以+12.7和+2.3 F1得分衡量。我们还表明,与我们的方法相结合,新颖的零星注计划在微量的注解预算方面比详尽无遗。