Recently, the task of distantly supervised (DS) ultra-fine entity typing has received significant attention. However, DS data is noisy and often suffers from missing or wrong labeling issues resulting in low precision and low recall. This paper proposes a novel ultra-fine entity typing model with denoising capability. Specifically, we build a noise model to estimate the unknown labeling noise distribution over input contexts and noisy type labels. With the noise model, more trustworthy labels can be recovered by subtracting the estimated noise from the input. Furthermore, we propose an entity typing model, which adopts a bi-encoder architecture, is trained on the denoised data. Finally, the noise model and entity typing model are trained iteratively to enhance each other. We conduct extensive experiments on the Ultra-Fine entity typing dataset as well as OntoNotes dataset and demonstrate that our approach significantly outperforms other baseline methods.
翻译:最近,远处监督的超功能实体打字工作受到极大关注,然而,DS数据很吵,常常有缺失或错误的标签问题,导致低精确度和低回调。本文提出一个新的具有拆音能力的超功能实体打字模型。具体地说,我们建立一个噪音模型,以估计输入环境和吵闹标签上未知的标签噪音分布。有了噪音模型,从输入中减去估计的噪音,就可以找到更可靠的标签。此外,我们提议采用双编码结构的实体打字模型,接受脱色数据培训。最后,对噪音模型和实体打字模型进行迭接训练,以相互加强。我们在Ultra-Fine实体打数据集和Onto Notes数据集上进行了广泛的实验,并表明我们的方法大大优于其他基线方法。