TABi: 用于 Open- Docel 实体检索的类型 Aware 双编码器 (TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval)

Entity retrieval--retrieving information about entity mentions in a query--is a key step in open-domain tasks, such as question answering or fact checking. However, state-of-the-art entity retrievers struggle to retrieve rare entities for ambiguous mentions due to biases towards popular entities. Incorporating knowledge graph types during training could help overcome popularity biases, but there are several challenges: (1) existing type-based retrieval methods require mention boundaries as input, but open-domain tasks run on unstructured text, (2) type-based methods should not compromise overall performance, and (3) type-based methods should be robust to noisy and missing types. In this work, we introduce TABi, a method to jointly train bi-encoders on knowledge graph types and unstructured text for entity retrieval for open-domain tasks. TABi leverages a type-enforced contrastive loss to encourage entities and queries of similar types to be close in the embedding space. TABi improves retrieval of rare entities on the Ambiguous Entity Retrieval (AmbER) sets, while maintaining strong overall retrieval performance on open-domain tasks in the KILT benchmark compared to state-of-the-art retrievers. TABi is also robust to incomplete type systems, improving rare entity retrieval over baselines with only 5% type coverage of the training dataset. We make our code publicly available at https://github.com/HazyResearch/tabi.

翻译：有关实体的检索-检索信息在问答或事实检查等开放域任务中提及,这是开放域任务中的一个关键步骤。然而,由于对受欢迎实体的偏见,最先进的实体检索者努力检索稀有实体,因为对受欢迎实体有偏差。在培训中纳入知识图表类型可有助于克服受欢迎偏差,但存在若干挑战:(1) 现有基于类型检索方法需要将边界作为投入提及,但开放域任务在无结构文本上运行,(2) 以类型为基础的方法不应损害整体性能,(3) 以类型为基础的方法应当对吵闹和缺失的类型保持稳健。在这项工作中,我们采用TABi这一方法,联合培训双编码者在知识图表类型和无结构的文本方面进行检索,以用于实体检索开放式域任务。TABi 利用类型强化的对比损失,鼓励实体和类似类型的查询在嵌入空间接近。TABi改进了在开放域/regyalval(AmbER)系统中的稀有实体的检索功能,同时在开放域数据库中保持强大的全面检索性业绩。在数据库类型为5-reabreflial listryal train tylementaltraction ablementalty tystelate wetractions