建立网络规模的本体学 (GIANT: Scalable Creation of a Web-scale Ontology)

Understanding what online users may pay attention to is key to content recommendation and search services. These services will benefit from a highly structured and web-scale ontology of entities, concepts, events, topics and categories. While existing knowledge bases and taxonomies embody a large volume of entities and categories, we argue that they fail to discover properly grained concepts, events and topics in the language style of online population. Neither is a logically structured ontology maintained among these notions. In this paper, we present GIANT, a mechanism to construct a user-centered, web-scale, structured ontology, containing a large number of natural language phrases conforming to user attentions at various granularities, mined from a vast volume of web documents and search click graphs. Various types of edges are also constructed to maintain a hierarchy in the ontology. We present our graph-neural-network-based techniques used in GIANT, and evaluate the proposed methods as compared to a variety of baselines. GIANT has produced the Attention Ontology, which has been deployed in various Tencent applications involving over a billion users. Online A/B testing performed on Tencent QQ Browser shows that Attention Ontology can significantly improve click-through rates in news recommendation.

翻译：虽然现有的知识基础和分类包含大量实体和类别,但我们认为,它们未能以在线人口的语言形式正确发现有条不紊的概念、事件和主题,也没有在这些概念中保持逻辑结构的本体学。在本文件中,我们介绍了GIANT,这是构建一个以用户为中心的、网络规模的、结构化的本体学的机制,包含大量自然语言短语,在各种颗粒上与用户的注意相符,它们来自大量网络文件和搜索点击图。还建立了各种边缘,以维持本体学的等级。我们介绍了在GIANT中使用的基于图表的网络技术,并对照各种基线对拟议方法进行了评估。GIANT制作了《关注本体学》,在涉及10亿用户的各种Tententin应用中部署了这种关注。在线A/B测试在Tencent 浏览率上可以大大改进Tencent 浏览率。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【SIGMOD2020-腾讯】Web规模本体可扩展构建

专知会员服务

32+阅读 · 2020年4月12日