We develop a novel exploratory tool for non-Euclidean object data based on data depth, extending the celebrated Tukey's depth for Euclidean data. The proposed metric halfspace depth, applicable to data objects in a general metric space, assigns to data points depth values that characterize the centrality of these points with respect to the distribution and provides an interpretable center-outward ranking. Desirable theoretical properties that generalize standard depth properties postulated for Euclidean data are established for the metric halfspace depth. The depth median, defined as the deepest point, is shown to have high robustness as a location descriptor both in theory and in simulation. We propose an efficient algorithm to approximate the metric halfspace depth and illustrate its ability to adapt to the intrinsic data geometry. The metric halfspace depth was applied to an Alzheimer's disease study, revealing group differences in the brain connectivity, modeled as covariance matrices, for subjects in different stages of dementia. Based on phylogenetic trees of 7 pathogenic parasites, our proposed metric halfspace depth was also used to construct a meaningful consensus estimate of the evolutionary history and to identify potential outlier trees.
翻译:我们开发了一个基于数据深度的非日光化天体数据的新探索工具,扩展了人们所庆祝的突克利德卫星数据深度。拟议的半空半空半深测量法适用于普通度空间中的数据对象,为数据点深度值分配了这些点在分布方面的核心特征,提供了可解释的中向外排排名。为光度半空深度建立了典型的理论属性,将为厄克利德星数据所假设的标准深度特性概括化。深度中值被定义为最深点,在理论和模拟中显示其作为位置描述符具有高度的稳健性。我们提出了一种高效的算法,以近似度半空深度,并展示其适应内在数据几何测量的能力。参数半空深用于一项阿尔茨海默氏病研究,揭示了大脑连接方面的群体差异,以共变矩阵模型为模型,用于不同阶段的Dementia研究对象。基于7种致病寄生虫的植物树,我们提议的半空深度指标也用于构建对进化历史进行有意义的共识估计,并查明潜在的外部树。