The past years have seen a drastic rise in studies devoted to the investigation of colexification patterns in individual languages families in particular and the languages of the world in specific. Specifically computational studies have profited from the fact that colexification as a scientific construct is easy to operationalize, enabling scholars to infer colexification patterns for large collections of cross-linguistic data. Studies devoted to partial colexifications -- colexification patterns that do not involve entire words, but rather various parts of words--, however, have been rarely conducted so far. This is not surprising, since partial colexifications are less easy to deal with in computational approaches and may easily suffer from all kinds of noise resulting from false positive matches. In order to address this problem, this study proposes new approaches to the handling of partial colexifications by (1) proposing new models with which partial colexification patterns can be represented, (2) developing new efficient methods and workflows which help to infer various types of partial colexification patterns from multilingual wordlists, and (3) illustrating how inferred patterns of partial colexifications can be computationally analyzed and interactively visualized.
翻译:过去几年来,专门调查个别语言家庭,特别是具体世界语言的僵化模式的研究急剧增加,具体计算研究获益于以下事实:作为一种科学结构的硬化容易操作,使学者能够对大量收集的跨语言数据进行细化,专门研究部分的硬化模式 -- -- 不涉及全部文字的僵化模式,而只是文字的不同部分 -- -- 但迄今很少进行这种研究,这并不奇怪,因为部分硬化在计算方法上较难处理,而且可能很容易受到因虚假正面匹配而产生的各种噪音的影响。为解决这一问题,本研究提出了处理部分硬化的新方法,其方法是:(1) 提出可以代表部分僵化模式的新模式,(2) 开发新的高效方法和工作流程,帮助从多种语言的词列表中推断出各种类型的部分硬化模式,以及(3) 说明部分硬化的推断模式如何可以计算和交互直观化。