In this paper we identify several serious problems that arise in the use of syntactic data from the SSWL database for the purpose of computational phylogenetic reconstruction. We show that the most naive approach fails to produce reliable linguistic phylogenetic trees. We identify some of the sources of the observed problems and we discuss how they may be, at least partly, corrected by using additional information, such as prior subdivision into language families and subfamilies, and a better use of the information about ancient languages. We also describe how the use of phylogenetic algebraic geometry can help in estimating to what extent the probability distribution at the leaves of the phylogenetic tree obtained from the SSWL data can be considered reliable, by testing it on phylogenetic trees established by other forms of linguistic analysis. In simple examples, we find that, after restricting to smaller language subfamilies and considering only those SSWL parameters that are fully mapped for the whole subfamily, the SSWL data match extremely well reliable phylogenetic trees, according to the evaluation of phylogenetic invariants. This is a promising sign for the use of SSWL data for linguistic phylogenetics.
翻译:在本文中,我们找出了在利用SSWL数据库的合成数据进行计算性植物基因重建过程中产生的若干严重问题。我们指出,最天真的方法未能产生可靠的语言植物基因树的可靠性。我们查明了观察到的问题的一些根源,并讨论了如何通过使用额外信息,如将先前的子体分解成语言家庭和次家庭,以及更好地利用有关古代语言的信息,来纠正这些问题。我们还描述了利用SSSWL的植物基因代代谢几何几何测量方法如何有助于估计从SSSWL数据中获取的植物基因树叶的概率分布在多大程度上可以被视为可靠的。我们通过对其他形式的语言分析所建立的植物基因树进行测试,我们讨论了这些问题如何至少部分地加以纠正。我们从简单的例子中发现,在限制使用较小的语言亚种和仅考虑那些完全针对整个亚种的SSWL参数之后,SWL数据如何与极其可靠的植物基因树相匹配。根据对SSWI的遗传学数据的评估,这是对SWIvesticolative-degraphical-degraphystaltical- supstristigraphets)使用的一种标志。