The magnitude of Pearson correlation between two scalar random variables can be visually judged from the two-dimensional scatter plot of an independent and identically distributed sample drawn from the joint distribution of the two variables: the closer the points lie to a straight slanting line, the greater the correlation. To the best of our knowledge, similar graphical representation or geometric quantification of tree correlation does not exist in the literature although tree-shaped datasets are frequently encountered in various fields, such as academic genealogy tree and embryonic development tree. In this paper, we introduce a geometric statistic to both represent tree correlation intuitively and quantify its magnitude precisely. The theoretical properties of the geometric statistic are provided. Large-scale simulations based on various data distributions demonstrate that the geometric statistic is precise in measuring the tree correlation. Its real application on mathematical genealogy trees also demonstrated its usefulness.
翻译:从从两个变量的联合分布中提取的独立和相同分布的样本的二维散射图中,可以从视觉上判断两个卡路里随机变量之间的比尔森相关性的大小:两点越接近直斜线,相关性就越大。据我们所知,文献中不存在类似的树木相关性图形表示或几何量化,尽管在学术基因树和胚胎发育树等各个领域经常遇到树形数据集。在本文中,我们引入了几何统计,既代表树木的直观相关性,又准确量化其规模。提供了几何统计的理论属性。基于各种数据分布的大规模模拟表明,测量树木相关性的几何统计是准确的。它在数学基因树上的实际应用也证明了它的实用性。</s>