Google and other search engines feature the entity search by representing a knowledge card summarizing related facts about the user-supplied entity. However, the knowledge card is limited to certain entities that have a Wiki page or an entry in encyclopedias such as Freebase. The current encyclopedias are limited to highly popular entities, which are far fewer compared with the emerging entities. Despite the availability of knowledge about the emerging entities on the search results, yet there are no approaches to capture, abstract, summerize, fuse, and validate fragmented pieces of knowledge about them. Thus, in this paper, we develop approaches to capture two types of knowledge about the emerging entities from a corpus extended from top-n search snippets of a given emerging entity. The first kind of knowledge identifies the role(s) of the emerging entity as, e.g., who is s/he? The second kind captures the entities closely associated with the emerging entity. As the testbed, we considered a collection of 20 emerging entities and 20 popular entities as the ground truth. Our approach is an unsupervised approach based on text analysis and entity embeddings. Our experimental studies show promising results as the accuracy of more than $87\%$ for recognizing entities and $75\%$ for ranking them. Besides $87\%$ of the entailed types were recognizable. Our testbed and source code is available on Github https://github.com/sunnyUD/research_source_code.
翻译:谷歌和其他搜索引擎以实体搜索为特征,它代表了一个知识卡,概述了用户提供实体的相关事实;然而,知识卡仅限于某些有维基页面或Freebase等百科全书条目的某些实体;目前的百科全书仅限于高度受欢迎的实体,与新兴实体相比,这些实体所占比例要小得多;尽管在搜索结果方面有关于新兴实体的知识,但是没有办法收集、抽象、夏季化、整合和验证关于这些实体的零散知识。因此,在本文件中,我们开发了方法,从某个新实体的顶层搜索夹扩展的文体中获取关于新兴实体的两类知识。第一种知识将新兴实体的作用确定为,例如,与新兴实体相比,作用要少得多;第二种知识捕捉与新兴实体密切相关的实体。作为测试台,我们认为收集了20个新兴实体和20个受欢迎的实体作为地面真相。我们的方法是一种未经监督的方法,其基础是文本分析和实体嵌嵌入$。我们实验性研究研究的模型将显示我们Giream_ral7的准确性数据作为Gial_b_b_rassalexexexexexexexex