Encoding facts as representations of entities and binary relationships between them, as learned by knowledge graph representation models, is useful for various tasks, including predicting new facts, question answering, fact checking and information retrieval. The focus of this thesis is on (i) improving knowledge graph representation with the aim of tackling the link prediction task; and (ii) devising a theory on how semantics can be captured in the geometry of relation representations. Most knowledge graphs are very incomplete and manually adding new information is costly, which drives the development of methods which can automatically infer missing facts. The first contribution of this thesis is HypER, a convolutional model which simplifies and improves upon the link prediction performance of the existing convolutional state-of-the-art model ConvE and can be mathematically explained in terms of constrained tensor factorisation. The second contribution is TuckER, a relatively straightforward linear model, which, at the time of its introduction, obtained state-of-the-art link prediction performance across standard datasets. The third contribution is MuRP, first multi-relational graph representation model embedded in hyperbolic space. MuRP outperforms all existing models and its Euclidean counterpart MuRE in link prediction on hierarchical knowledge graph relations whilst requiring far fewer dimensions. Despite the development of a large number of knowledge graph representation models with gradually increasing predictive performance, relatively little is known of the latent structure they learn. We generalise recent theoretical understanding of how semantic relations of similarity, paraphrase and analogy are encoded in the geometric interactions of word embeddings to how more general relations, as found in knowledge graphs, can be encoded in their representations.
翻译:将事实作为实体的表达形式和它们之间的二进制关系进行编码,这是知识图形代表模型所学的,对各种任务都有用,包括预测新事实、回答问题、事实检查和信息检索。这一论文的重点是:(一) 改进知识图形代表形式,目的是处理链接预测任务;(二) 设计一个理论,说明如何在关系表达方式的几何中捕捉语义。大多数知识图表非常不完整,人工添加新信息成本很高,这促使开发能够自动推断缺失事实的方法。这一理论的首份贡献是Hyper,这是一个革命性模型,能够简化和改进现有革命性状态模型ConvE的预测性能的链接性能,并且可以用数学因素解释来解释;第二个理论是TuckER,一个相对简单明的线性模型,在引入时,它们可以获得最先进的逻辑链接,从而自动推断缺失事实。第三个贡献是MuRP,第一个多级图表代表模式,在相对级平级代表关系中简化了比较级关系中的高级数据结构,在深度预测关系中需要大量普通的深度的深度数据分析,在深度分析中找到其深度的深度的深度关系。