The Chapter starts with introductory information about quantitative linguistics notions, like rank--frequency dependence, Zipf's law, frequency spectra, etc. Similarities in distributions of words in texts with level occupation in quantum ensembles hint at a superficial analogy with statistical physics. This enables one to define various parameters for texts based on this physical analogy, including "temperature", "chemical potential", entropy, and some others. Such parameters provide a set of variables to classify texts serving as an example of complex systems. Moreover, texts are perhaps the easiest complex systems to collect and analyze. Similar approaches can be developed to study, for instance, genomes due to well-known linguistic analogies. We consider a couple of approaches to define nucleotide sequences in mitochondrial DNAs and viral RNAs and demonstrate their possible application as an auxiliary tool for comparative analysis of genomes. Finally, we discuss entropy as one of the parameters, which can be easily computed from rank--frequency dependences. Being a discriminating parameter in some problems of classification of complex systems, entropy can be given a proper interpretation only in a limited class of problems. Its overall role and significance remain an open issue so far.
翻译:本章首先介绍关于语言概念的定量信息,例如按级-频率依赖、Zipf的法律、频率光谱等。 在数量组合中具有高度占用性的文本中,文字分布的相似性意味着与统计物理的表面类比。这样可以界定基于这种物理类比的文本的各种参数,包括“温度”、“化学潜力”、“化学潜力”、昆虫等。这些参数提供了一套变量,用于对作为复杂系统范例的文本进行分类的变量。此外,文本也许是最容易收集和分析的复杂系统。可以开发类似的方法,例如研究由于众所周知的语言类比而产生的基因组。我们考虑用几种方法来界定单子体DNA和病毒RNA的核酸序列,并表明它们作为基因组比较分析辅助工具的可能应用。最后,我们讨论的参数之一是昆虫,可以很容易地从按级-频率依赖性进行计算。在复杂系统分类的某些问题中,作为区分参数的一个区别参数,由于众所周知的语言类比,因此,可以给其总的问题以如此广泛的解释。