We present a novel 2D convex hull peeling algorithm for outlier detection, which repeatedly removes the point on the hull that decreases the hull's area the most. To find k outliers among n points, one simply peels k points. The algorithm is an efficient heuristic for exact methods, which find the k points whose removal together results in the smallest convex hull. Our algorithm runs in O(nlogn) time using O(n) space for any choice of k. This is a significant speedup compared to the fastest exact algorithms, which run in O(n^2logn + (n - k)^3) time using O(n\logn + (n-k)^3) space by Eppstein et al., and O(nlogn + 4k_C_2k (3k)^k n) time by Atanassov et al. Existing heuristic peeling approaches are not area-based. Instead, an approach by Harsh et al. repeatedly removes the point furthest from the mean using various distance metrics and runs in O(nlogn + kn) time. Other approaches greedily peel one convex layer at a time, which is efficient when using an O(nlogn) time algorithm by Chazelle to compute the convex layers. However, in many cases this fails to recover outliers. For most values of n and k, our approach is the fastest and first practical choice for finding outliers based on minimizing the area of the convex hull. Our algorithm also generalizes to other objectives such as perimeter.
翻译:暂无翻译