One major problem in Natural Language Processing is the automatic analysis and representation of human language. Human language is ambiguous and deeper understanding of semantics and creating human-to-machine interaction have required an effort in creating the schemes for act of communication and building common-sense knowledge bases for the 'meaning' in texts. This paper introduces computational methods for semantic analysis and the quantifying the meaning of short scientific texts. Computational methods extracting semantic feature are used to analyse the relations between texts of messages and 'representations of situations' for a newly created large collection of scientific texts, Leicester Scientific Corpus. The representation of scientific-specific meaning is standardised by replacing the situation representations, rather than psychological properties, with the vectors of some attributes: a list of scientific subject categories that the text belongs to. First, this paper introduces 'Meaning Space' in which the informational representation of the meaning is extracted from the occurrence of the word in texts across the scientific categories, i.e., the meaning of a word is represented by a vector of Relative Information Gain about the subject categories. Then, the meaning space is statistically analysed for Leicester Scientific Dictionary-Core and we investigate 'Principal Components of the Meaning' to describe the adequate dimensions of the meaning. The research in this paper conducts the base for the geometric representation of the meaning of texts.
翻译:自然语言处理中的一个主要问题是语言语言的自动分析和表述。人类语言是对语义的模糊和更深刻的理解,创造人与机器的互动要求努力制定沟通计划和为文本中的“含义”建立常识知识基础。本文介绍了语义分析和对短科学文本含义的量化的计算方法。通过计算方法提取语义特征,用来分析新创建的大量科学文本“利斯特科学公司”的文本和“情况表述”之间的关系。科学特定含义的表述标准化,方法是用某些属性的矢量取代情况表述,而不是心理属性:文本所属科学主题类别清单。首先,本文介绍了“语言空间”的描述,其含义的信息表述来自科学类别中出现词的含义,即,一个词的含义由关于主题类别的相对信息收益的矢量表示。随后,对空间的含义进行了标准化化,以取代情况表述,而不是心理属性的特性:文本所属的科学类别清单。首先,本文介绍了“空间空间”的描述基础,我们从统计角度对图像的含义进行了充分分析,我们从图像的判读取基础的图像。