An exciting frontier in natural language understanding (NLU) and generation (NLG) calls for (vision-and-) language models that can efficiently access external structured knowledge repositories. However, many existing knowledge bases only cover limited domains, or suffer from noisy data, and most of all are typically hard to integrate into neural language pipelines. To fill this gap, we release VisualSem: a high-quality knowledge graph (KG) which includes nodes with multilingual glosses, multiple illustrative images, and visually relevant relations. We also release a neural multi-modal retrieval model that can use images or sentences as inputs and retrieves entities in the KG. This multi-modal retrieval model can be integrated into any (neural network) model pipeline. We encourage the research community to use VisualSem for data augmentation and/or as a source of grounding, among other possible uses. VisualSem as well as the multi-modal retrieval models are publicly available and can be downloaded in this URL: https://github.com/iacercalixto/visualsem
翻译:自然语言理解(NLU)和生成(NLG)的令人振奋的前沿自然语言理解(NLU)和生成(NLG)需要能够有效访问外部结构化知识库的(视觉和)语言模型。然而,许多现有知识库仅覆盖有限的领域,或受到数据噪音的影响,而且其中多数通常很难融入神经语言管道。为了填补这一空白,我们发布“视觉Sem”:高质量知识图(KG),其中包括多语言的节点、多示例图像和视觉相关关系。我们还发布一个神经多模式检索模型,可以将图像或句子用作输入和检索KG中的实体。这种多模式检索模型可以纳入任何(神经网络)示范管道。我们鼓励研究界使用“视觉Sem”来增强数据和/或作为地基来源,其他可能用途。视觉Sem以及多模式检索模型可以公开获取,并可在这个网址上下载:https://github.com/accercalixto/visalsem。