Conventional representation learning algorithms for knowledge graphs (KG) map each entity to a unique embedding vector. Such a shallow lookup results in a linear growth of memory consumption for storing the embedding matrix and incurs high computational costs when working with real-world KGs. Drawing parallels with subword tokenization commonly used in NLP, we explore the landscape of more parameter-efficient node embedding strategies with possibly sublinear memory requirements. To this end, we propose NodePiece, an anchor-based approach to learn a fixed-size entity vocabulary. In NodePiece, a vocabulary of subword/sub-entity units is constructed from anchor nodes in a graph with known relation types. Given such a fixed-size vocabulary, it is possible to bootstrap an encoding and embedding for any entity, including those unseen during training. Experiments show that NodePiece performs competitively in node classification, link prediction, and relation prediction tasks while retaining less than 10% of explicit nodes in a graph as anchors and often having 10x fewer parameters.
翻译:知识图形( KG) 的常规表达式学习算法( KG) 映射每个实体到一个独特的嵌入矢量 。 这样浅浅的外观导致存储嵌入矩阵的内存消耗量线性增长, 在与真实世界 KGs 合作时, 导致计算成本高。 与 NLP 常用的子词符号化平行, 我们探索了更具有参数效率的节点嵌入战略的景观, 并可能存在子线性内存要求 。 为此, 我们提议了 NOdePiece, 这是一种基于锁定的学习固定实体词汇的方法 。 在 NodePiece 中, 一个子字/ 子项单位的词汇用固定节点在已知关系类型的图形中构建 。 鉴于这种固定大小的词汇, 有可能为任何实体设置编码和嵌入, 包括培训期间的隐形符号。 实验显示NodePiece 在节点分类、 链接预测和关联性预测任务中具有竞争力, 同时在图形中保留不到10%的明确节点作为锚值, 并且通常有 10xx 的参数 。