We present an instance-based nearest neighbor approach to entity linking. In contrast to most prior entity retrieval systems which represent each entity with a single vector, we build a contextualized mention-encoder that learns to place similar mentions of the same entity closer in vector space than mentions of different entities. This approach allows all mentions of an entity to serve as "class prototypes" as inference involves retrieving from the full set of labeled entity mentions in the training set and applying the nearest mention neighbor's entity label. Our model is trained on a large multilingual corpus of mention pairs derived from Wikipedia hyperlinks, and performs nearest neighbor inference on an index of 700 million mentions. It is simpler to train, gives more interpretable predictions, and outperforms all other systems on two multilingual entity linking benchmarks.
翻译:我们对连接的实体提出了一个基于实例的近邻方法。与大多数先前的实体检索系统相比,这些系统代表每个实体使用单一矢量,我们建立了一个背景化的引用编码器,在矢量空间中比不同实体更接近同一实体。这个方法允许所有提及实体作为“类原型”的“类原型”使用,因为推理涉及从成套培训中提及的全套标签实体中检索,并应用最近的提及邻国实体的标签。我们的模式是用从维基百科超链接中衍生出来的大量多语种参考组合来培训,并用7亿的索引进行最近的邻居推论。更简单易于培训,提供更多的可解释的预测,并在两个多语种实体连接基准上优于所有其他系统。