The Tukey (or halfspace) depth extends nonparametric methods toward multivariate data. The multivariate analogues of the quantiles are the central regions of the Tukey depth, defined as sets of points in the $d$-dimensional space whose Tukey depth exceeds given thresholds $k$. We address the problem of fast and exact computation of those central regions. First, we analyse an efficient Algorithm A from Liu et al. (2019), and prove that it yields exact results in dimension $d=2$, or for a low threshold $k$ in arbitrary dimension. We provide examples where Algorithm A fails to recover the exact Tukey depth region for $d>2$, and propose a modification that is guaranteed to be exact. We express the problem of computing the exact central region in its dual formulation, and use that viewpoint to demonstrate that further substantial improvements to our algorithm are unlikely. An efficient C++ implementation of our exact algorithm is freely available in the R package TukeyRegion.
翻译:Tukey (或半空) 深度将非参数性方法延伸至多变量数据。 量化的多变量类比是 Tukey 深度的中心区域, 定义为 $d$- 维空间的一组点, Tukey 深度超过给定的阈值 $k$。 我们处理这些中心区域的快速和精确计算问题。 首先, 我们从刘等人( 2019年) 分析高效的 Algorithm A, 并证明它在维度方面产生准确的结果 $d=2$, 或者在任意性的低阈值 $k$。 我们举例说明了 Algorithm A 未能以 $>2 来回收准确的 Tukey 深度区域, 并提出了可以保证精确的修改建议。 我们用其二元配方表示计算准确的中央区域的问题, 并用这一观点来证明我们算法不可能得到进一步的大幅度改进。 在 R 包 TukeyRegion 中, 可以自由使用我们精确的算法的 C+ 。