We present a method for constructing taxonomic trees (e.g., WordNet) using pretrained language models. Our approach is composed of two modules, one that predicts parenthood relations and another that reconciles those predictions into trees. The parenthood prediction module produces likelihood scores for each potential parent-child pair, creating a graph of parent-child relation scores. The tree reconciliation module treats the task as a graph optimization problem and outputs the maximum spanning tree of this graph. We train our model on subtrees sampled from WordNet, and test on non-overlapping WordNet subtrees. We show that incorporating web-retrieved glosses can further improve performance. On the task of constructing subtrees of English WordNet, the model achieves 66.7 ancestor F1, a 20.0% relative increase over the previous best published result on this task. In addition, we convert the original English dataset into nine other languages using Open Multilingual WordNet and extend our results across these languages.
翻译:我们提出一种方法,用经过预先培训的语言模型构建分类树(如WordNet)。我们的方法由两个模块组成,一个是预测亲子关系,另一个是将预测结果与树木相协调。亲子预测模块为每个潜在的亲子对子带来概率分数,制作一个父子关系分数图。树调和模块将任务视为一个图形优化问题,并输出出本图的最大横幅树。我们在WordNet抽样的子树上培训我们的模型,并测试非重叠的WordNet子树上测试。我们显示,纳入网络探索的图象可以进一步改善业绩。在建设英文WordNet子树的任务中,模型实现了66.7个祖先F1,比先前最佳公布的结果增加了20.0 %。此外,我们用开放多语WordNet将原始的英语数据集转换为其他九种语言,并将我们的成果扩展到这些语言。