Classical multivariate statistics measures the outlyingness of a point by its Mahalanobis distance from the mean, which is based on the mean and the covariance matrix of the data. A multivariate depth function is a function which, given a point and a distribution in d-space, measures centrality by a number between 0 and 1, while satisfying certain postulates regarding invariance, monotonicity, convexity and continuity. Accordingly, numerous notions of multivariate depth have been proposed in the literature, some of which are also robust against extremely outlying data. The departure from classical Mahalanobis distance does not come without cost. There is a trade-off between invariance, robustness and computational feasibility. In the last few years, efficient exact algorithms as well as approximate ones have been constructed and made available in R-packages. Consequently, in practical applications the choice of a depth statistic is no more restricted to one or two notions due to computational limits; rather often more notions are feasible, among which the researcher has to decide. The article debates theoretical and practical aspects of this choice, including invariance and uniqueness, robustness and computational feasibility. Complexity and speed of exact algorithms are compared. The accuracy of approximate approaches like the random Tukey depth is discussed as well as the application to large and high-dimensional data. Extensions to local and functional depths and connections to regression depth are shortly addressed.
翻译:典型的多变量统计测量一个点的偏差度,它以数据的平均值和共差矩阵为基础。多变量深度函数是一种函数,根据一个点和 d-空间的分布,测量0到1之间的中心度,同时满足某些关于偏差、单一度、共性和连续性的假设。因此,文献中提出了多种变量深度概念,其中一些概念对极端偏差的数据也是很强的。偏离古典的Mahalanobis距离并非没有代价的。在差异、稳健和计算可行性之间存在着一种权衡。在过去几年里,在R组合中构建并提供了高效精确的算法和近似算法。因此,在实际应用中,深度统计的选择并不局限于一个或两个概念,因为计算限度;往往有更多的概念是可行的,其中研究者必须加以决定。文章辩论的理论和实践方面,包括精确度、稳妥度和精确度的精确度,与精确度的精确度相比,精确度和精确度的精确度的精确度和精确度的精确度是比较。