本文主要介绍开放域概念体系的自动构建。首先介绍概念体系自动构建的任务描述,然后介绍概念体系构建的通用步骤:is-a 关系对抽取、概念体系构建。最后介绍了在这些步骤中的一些常用方法。
大部分概念体系的构建可以划分为两个步骤,即基于模式或基于分布式方法的is-a关系对抽取以及利用is-a 关系对构建一个完整的概念体系。
最早且最具影响力的基于模式的关系对抽取方法始于Hearst(1992)[4]。在论文《Automatic acquisition of hyponyms from large text corpora》中,作者手动设计了一些词法模式(也可以叫做模板、规则或路径)来抽取is-a关系对。一个典型的模式形如"[C] such as [E]",其中[C]和[E]分别是上位词y和下位词x的名词占位符。基于这些手动设计的模式,系统可以自动化地抽取大量的is-a关系对,这个方法也因此被一些系统所采用,如Wu(2012)[5]等基于Hearst模式和大量的网页文本构建了Probase系统,该系统包含有265万的概念以及2076万的is-a关系对。
利用上位词推断来提高召回率。由于基于模式的方法要求is-a关系对必须在一个句子中共现,这就限制了抽取的召回率。Ritter(2009)[12]等提出一个想法,如果y是x的上位词,且x和x'十分相似,则y很有可能是x'的上位词。他们还训练一个HMM来学习一个比基于向量方法更好的相似度度量方法。此外,一些方法还通过考虑下位词的修饰词来生成额外的is-a关系对。例如我们可以很容易地推断出"grizzly bear"是一个"bear",因为其中心词为"bear"。这个思想在中文中也有类似的体现,比如"哈尔滨工业大学"的中心词"大学"就是其上位词。
Wen Hua, Zhongyuan Wang, Haixun Wang, Kai Zheng, and Xiaofang Zhou. 2017. Understand short texts by harvesting and analyzing semantic knowledge. IEEE Trans. Knowl. Data Eng. 29(3):499–512.
[2]Yuchen Zhang, Amr Ahmed, Vanja Josifovski, and Alexander J. Smola. 2014. Taxonomy discovery for personalized recommendation. In Proceedings of the Seventh ACM International Conference on Web Search and Data Mining. pages 243–252.
[3]Shuo Yang, Lei Zou, Zhongyuan Wang, Jun Yan, and Ji-Rong Wen. 2017. Efficiently answering technical questions - A knowledge graph approach. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. pages 3111–3118.
[4]Marti A. Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics. pages 539–545.
[5]Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Qili Zhu. 2012. Probase: a probabilistic taxonomy for text understanding. In Proceedings of the ACM SIGMOD International Conference on Management of Data. pages 481–492.
[6]Alan Ritter, Stephen Soderland, and Oren Etzioni. 2009. What is this, anyway: Automatic hypernym discovery. In Learning by Reading and Learning to Read, Proceedings of the 2009 AAAI Spring Symposium. pages 88–93.
[7]Anh Tuan Luu, Jung-jae Kim, and See-Kiong Ng. 2014. Taxonomy construction using syntactic contextual evidence. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. pages 810–819.
[8]Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2004. Learning syntactic patterns for automatic hypernym discovery. In Proceedings of the 17th Annual Conference on Neural Information Processing Systems. pages 1297–1304.
[9]Ndapandula Nakashole, Gerhard Weikum, and Fabian M. Suchanek. 2012. PATTY: A taxonomy of relational patterns with semantic types. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Pages 1135–1145.
[10]Andrew Carlson, Justin Betteridge, Richard C. Wang, Estevam R. Hruschka Jr. and Tom M. Mitchell. 2010. Coupled semi-supervised learning for information extraction. In Proceedings of the Third International Conference on Web Search and Web Data Mining. pages 101–110.
[11]Zornitsa Kozareva, Ellen Riloff, and Eduard H. Hovy. 2008. Semantic class learning from the web with hyponym pattern linkage graphs. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. pages 1048–1056.
[12]Alan Ritter, Stephen Soderland, and Oren Etzioni. 2009. What is this, anyway: Automatic hypernym discovery. In Learning by Reading and Learning to Read, Proceedings of the 2009 AAAI Spring Symposium. pages 88–93.
[13]Oren Etzioni, Michael J. Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2004. Web-scale information extraction in knowitall: (preliminary results). In Proceedings of the 13th international conference on World Wide Web. pages 100–110.
[14]Dekang Lin. 1998. An information-theoretic definition of similarity. In Proceedings of the Fifteenth International Conference on Machine Learning. Pages 296–304.
[15]Julie Weeds, David J. Weir, and Diana McCarthy. 2004. Characterising measures of lexical distributional similarity. In Proceedings of the 20th International Conference on Computational Linguistics.
[16]Enrico Santus, Alessandro Lenci, Qin Lu, and Sabine Schulte im Walde. 2014. Chasing hypernyms in vector spaces with entropy. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. pages 38–42.
[17]Stephen Roller, Katrin Erk, and Gemma Boleda. 2014. Inclusive yet selective: Supervised distributional hypernymy detection. In Proceedings of the 25th International Conference on Computational Linguistics. pages 1025–1036.
[18]Ruiji Fu, Jiang Guo, Bing Qin, Wanxiang Che, Haifeng Wang, and Ting Liu. 2014. Learning semantic hierarchies via word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. pages 1199–1209.
[19]Wei Shen, Jianyong Wang, Ping Luo, and Min Wang. 2012. A graph-based approach for ontology population with named entities. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. pages 345–354.
[20]Zornitsa Kozareva and Eduard H. Hovy. 2010. A semi-supervised method to learn and construct taxonomies using the web. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. pages 1110–1118.
[21]Daniele Alfarone and Jesse Davis. 2015. Unsupervised learning of an IS-A taxonomy from a limited domain-specific corpus. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence. pages 1434–1441.
[22]Luis Espinosa Anke, Horacio Saggion, Francesco Ronzano, and Roberto Navigli. 2016b. Extasem! extending, taxonomizing and semantifying domain terminologies. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. pages 2594–2600.
[23]Paola Velardi, Stefano Faralli, and Roberto Navigli. 2013. Ontolearn reloaded: A graph-based algorithm for taxonomy induction. Computational Linguistics 39(3):665–707.
[24]Jiaqing Liang, Yanghua Xiao, Yi Zhang, Seung-won Hwang, and Haixun Wang. 2017a. Graph-based wrong isa relation detection in a large-scale lexical taxonomy. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. pages 1178–1184.
[25]Chengyu Wang, Xiaofeng He, and Aoying Zhou. 2017. A Short Survey on Taxonomy Learning from Text Corpora: Issues, Resources and Recent Advances.In EMNLP.
后台回复“AM20” 可以获取《LinkedIN最新《注意力模型》综述论文大全,20页pdf》专知下载链接索引