We present an algorithm for computing $\epsilon$-coresets for $(k, \ell)$-median clustering of polygonal curves under the Fr\'echet distance. This type of clustering, which has recently drawn increasing popularity due to a growing number of applications, is an adaption of Euclidean $k$-median clustering: we are given a set of polygonal curves and want to compute $k$ center curves such that the sum of distances from the given curves to their nearest center curve is minimal. Additionally, we restrict the complexity, i.e., number of vertices, of the center curves to be at most $\ell$ each, to suppress overfitting. We achieve this result by applying the improved $\epsilon$-coreset framework by Braverman et al. to a generalized $k$-median problem over an arbitrary metric space. Later we combine this result with the recent result by Driemel et al. on the VC dimension of metric balls under the Fr\'echet distance. Finally, we show that $\epsilon$-coresets can be used to improve an existing approximation algorithm for $(1,\ell)$-median clustering under the Fr\'echet distance, taking a further step in the direction of practical $(k,\ell)$-median clustering algorithms.
翻译:我们提出了一个计算美元(k,\ ell) 美元(k,\ ell) 中位值核心值的算法, 计算美元( k, \ ell) 的中位值, 在 Fr\' echet 距离下, 多角曲线的中位值组合。 这种类型的组合最近由于应用量的增加而越来越受欢迎, 是欧洲cilidean $k$- 中位值组合的调整: 我们得到一套多边形曲线, 并且想要计算美元中间值曲线, 这样从给定曲线到最接近的中央曲线之间的距离总和是最小的。 此外, 我们限制中心曲线的复杂程度, 即每个最大值的顶值数量, 来抑制过度配置。 我们通过将Braverman et al. 改进后的美元( ) 美元( $) 的中位值核心值框架用于一个任意的计量空间。 我们稍后将这一结果与 Driemel et al. 在Frchetlechet 距离下, level $( $) VC 球的最近的结果结合起来。 最后, 我们显示, $\ realevalalal- gassional $1, 在现有的 $1, reval- glas- srealsealsealsealsetdaldaldals, $1, ladals be be be be a preal- gal- galtaltal- galtalsionaltaltaltaldaldaldaldgaldaldals $1, 我们。