项目名称: 面向网页检索应用的汉语语义概念图表示方法研究
项目编号: No.60873135
项目类型: 面上项目
立项/批准年度: 2009
项目学科: 金属学与金属工艺
项目作者: 陆汝占
作者单位: 上海交通大学
项目金额: 30万元
中文摘要: 本项目研究汉语概念语义计算,并将其用于网络信息检索分类,允许用户以自然语言表达其检索需求,能够自由、宽松地表达用户的真实意愿,吻合人脑思维与语言相紧密联系的认知方式,完全改变传统检索模型的局限性。在标引查询需求和文档时,根据汉语概念直接耦合的特点,在文档中提取概念,通过概念复合运算,建立概念与概念间的关系,构成有理据的概念网络,使关键词不再是离散无序不相关的碎片,从而能更好地体现文档所包含的语义信息。用户需求与文档之间的匹配就因此转化为概念网的整体匹配。预期可大大提高网络文本检索的准确率,降低用户找到自己所需信息的时间。 项目开发的二次检索技术可用于新一代网络搜索引擎及新一代数字图书馆,有望推进该领域的产业发展。兼有语言处理能力的搜索引擎技术一旦研究成功,有望开发成为国际上新一代搜索引擎,不仅受到国际大公司的青睐,更重要的是有望由此形成一个新的产业。
中文关键词: 汉语语义计算;概念复合;信息检索
英文摘要: This project studies Chinese conceptual semantic computation and employs it in information retrieval on internet, allowing users to express retrieval demand in natural language, which presents users’real need in free and easy way. This approach is consistent with the cognitive way that human brain and language are closely related, thus conquer the limitation of traditional information retrieval model. While indexing queries and documents, concepts are extracted from documents based on the characteristic that Chinese concepts couple directly. After that, the relations among concepts are constructed by conceptual compound computation, constituting a reasonable conceptual network, so that keywords are not discrete and unrelated segments any more, which can express semantic information of document better. Matching queries with documents is thus transformed into matching conceptual network. It is expected that the approach can improve precision of information retrieval and reduce the time of finding users’needed information. The second-retrieval technology of this project can be applied to new generation search engine and digital library and helps boost the development. It is hopeful that the search engine technology with natural language processing capability will become the new generation search engine, which can not only be given more attention by international companies, but also be hoped to become a new industry.
英文关键词: Chinese semantic computation;conceptual compound computation;information retrieval