Data depth, introduced by Tukey (1975), is an important tool in data science, robust statistics, and computational geometry. One chief barrier to its broader practical utility is that many common measures of depth are computationally intensive, requiring on the order of $n^d$ operations to exactly compute the depth of a single point within a data set of $n$ points in $d$-dimensional space. Often however, we are not directly interested in the absolute depths of the points, but rather in their \textit{relative ordering}. For example, we may want to find the most central point in a data set (a generalized median), or to identify and remove all outliers (points on the fringe of the data set with low depth). With this observation, we develop a novel and instance-adaptive algorithm for adaptive data depth computation by reducing the problem of exactly computing $n$ depths to an $n$-armed stochastic multi-armed bandit problem which we can efficiently solve. We focus our exposition on simplicial depth, developed by \citet{liu1990notion}, which has emerged as a promising notion of depth due to its interpretability and asymptotic properties. We provide general instance-dependent theoretical guarantees for our proposed algorithms, which readily extend to many other common measures of data depth including majority depth, Oja depth, and likelihood depth. When specialized to the case where the gaps in the data follow a power law distribution with parameter $\alpha<2$, we show that we can reduce the complexity of identifying the deepest point in the data set (the simplicial median) from $O(n^d)$ to $\tilde{O}(n^{d-(d-1)\alpha/2})$, where $\tilde{O}$ suppresses logarithmic factors. We corroborate our theoretical results with numerical experiments on synthetic data, showing the practical utility of our proposed methods.
翻译:由 Tukey (1975年) 引入的数据深度是数据科学、 稳健的统计和计算性几何中的一个重要工具。 与其更广泛的实际实用用途相比, 一个主要的障碍是, 许多共同的深度测量是计算密集的, 需要以美元为单位的操作来精确地计算一个点的深度, 在以美元为单位的一组数据中, 以美元为单位计算一个点的深度。 然而, 通常, 我们并不直接关注点的绝对深度, 而是直接关注这些点的纯度( textitle{ ridical order ) 。 例如, 我们可能想要在数据集中找到最核心的点( 通用的中位 ), 或者确定并删除所有外端( 数据边缘的点, 以美元为单位的深度 。 通过观察, 我们开发了一个创新的、 和直径直径的深度 数据, 从而将我们的拟议深度数据 解析性数据 解析到 。