We study the problem of textual relation embedding with distant supervision. To combat the wrong labeling problem of distant supervision, we propose to embed textual relations with global statistics of relations, i.e., the co-occurrence statistics of textual and knowledge base relations collected from the entire corpus. This approach turns out to be more robust to the training noise introduced by distant supervision. On a popular relation extraction dataset, we show that the learned textual relation embedding can be used to augment existing relation extraction models and significantly improve their performance. Most remarkably, for the top 1,000 relational facts discovered by the best existing model, the precision can be improved from 83.9% to 89.3%.
翻译:我们研究了在遥远的监管下嵌入文字关系的问题。为了解决错误的标签错误的远程监管问题,我们提议将文字关系与全球关系统计,即从整个元素中收集的文本和知识基础关系共同统计数据嵌入文字关系中。 这种方法对远程监管带来的培训噪音更为有力。 在大众关系提取数据集中,我们显示,学习的文字关系嵌入可以用来扩大现有的关系提取模型,并显著改善它们的业绩。 最显著的是,对于现有最佳模型所发现的1,000个最大关系事实,精确度可以从83.9%提高到89.3%。