With the increasing demand of intelligent systems capable of operating in different user contexts (e.g. users on the move) the correct interpretation of the user-need by such systems has become crucial to give a consistent answer to the user query. The most effective techniques which are used to address such task are in the fields of natural language processing and semantic expansion of terms. Such systems are aimed at estimating the actual meaning of input queries, addressing the concepts of the words which are expressed within the user questions. The aim of this paper is to demonstrate which semantic relation impacts the most in semantic expansion-based retrieval systems and to identify the best tradeoff between accuracy and noise introduction when combining such relations. The evaluations are made building a simple natural language processing system capable of querying any taxonomy-driven domain, making use of the combination of different semantic expansions as knowledge resources. The proposed evaluation employs a wide and varied taxonomy as a use-case, exploiting its labels as basis for the expansions. To build the knowledge resources several corpora have been produced and integrated as gazetteers into the NLP infrastructure with the purpose of estimating the pseudo-queries corresponding to the taxonomy labels, considered as the possible intents.
翻译:由于对能够在不同用户环境(如移动中的用户)操作的智能系统的需求日益增加,正确解释这些系统对用户需要的正确解释已成为对用户查询作出一致答复的关键。处理这种任务的最有效技术是在自然语言处理和语义扩展领域。这些系统旨在估计输入查询的实际含义,处理用户问题中表达的词词的概念。本文件的目的是说明在语义扩展检索系统中哪些语义关系影响最大,并在结合这种关系时确定准确性和噪音引进之间的最佳取舍。评价建立一个简单的自然语言处理系统,能够查询任何由分类学驱动的领域,利用不同语义扩展的组合作为知识资源。拟议的评价采用广泛而多样的分类学作为使用案例,利用其标签作为扩展的基础。为了估计可能存在的税种标签,已经制作了若干公司,并将其作为地名索引纳入国家地名学基础设施,作为估计可能采用的假税种标签。