We present an algorithm for computing $\epsilon$-coresets for $(k, \ell)$-median clustering of polygonal curves in $\mathbb{R}^d$ under the Fr\'echet distance. This type of clustering is an adaption of Euclidean $k$-median clustering: we are given a set of $n$ polygonal curves in $\mathbb{R}^d$, each of complexity (number of vertices) at most $m$, and want to compute $k$ median curves such that the sum of distances from the given curves to their closest median curve is minimal. Additionally, we restrict the complexity of the median curves to be at most $\ell$ each, to suppress overfitting, a problem specific for sequential data. Our algorithm has running time linear in $n$, sub-quartic in $m$ and quadratic in $\epsilon^{-1}$. With high probability it returns $\epsilon$-coresets of size quadratic in $\epsilon^{-1}$ and logarithmic in $n$ and $m$. We achieve this result by applying the improved $\epsilon$-coreset framework by Langberg and Feldman to a generalized $k$-median problem over an arbitrary metric space. Later we combine this result with the recent result by Driemel et al. on the VC dimension of metric balls under the Fr\'echet distance. Furthermore, our framework yields $\epsilon$-coresets for any generalized $k$-median problem where the range space induced by the open metric balls of the underlying space has bounded VC dimension, which is of independent interest. Finally, we show that our $\epsilon$-coresets can be used to improve the running time of an existing approximation algorithm for $(1,\ell)$-median clustering.
翻译:我们提出一个计算美元( k,\ ell) 美元( epsilon) 核心值的算法, 计算美元( k,\ ell) 的美元, 中间值的多角曲线集成, 在 Fr\\\ echet 距离下, 美元( r\\\ \\ \ \ d) 中位值 。 这种组合是 Euclidean $( $) 中位数的调整: 我们得到一套美元( mathbb{ { { \ \ \ \ 美元) 的多角值的多角值( 美元) 。 我们的算法以美元运行时间线性曲线, 美元( vertic) 以美元( vertic) 以美元( vertic) 来计算, 以美元( 美元) 中位值( 美元) 中位值( 美元) 直径( 美元) 直径( 美元) 直径( 美元) 直径( 美元) 美元) 直径( 直径) 直径) 直径( 直径) 直径) 直方( 直径) 直方( 直方( 直方( 美元) ) 直方( ) 美元) 直方( ) ) 直方) 直方( ) 直方( ) ) 根) 直方) 根( 直方( 直方) 直方( 平方) 平方( 平方) 平方) 根) 根) 根( 直方( 直方) 直方) 直方) 直方) 直方( 直方( ) ) ) 直方( ) ) 根) 根根根根根根根根根根根根根根根根根( 根根根根根根根根根根根根根根根根根根根根( ) 根根( ) ) 根根根根根根根根根根根根根根根根根根基) 根基) 根基) 基)