A time series of complexity $m$ is a sequence of $m$ real valued measurements. The discrete Fr\'echet distance $d_{dF}(x,y)$ is a distance measure between two time series $x$ and $y$ of possibly different complexity. Given a set of $n$ time series represented as $m$-dimensional vectors over the reals, the $(k,\ell)$-median problem under discrete Fr\'echet distance aims to find a set $C$ of $k$ time series of complexity $\ell$ such that $$\sum_{x\in P} \min_{c\in C} d_{dF}(x,c)$$ is minimized. In this paper, we give the first near-linear time $(1+\varepsilon)$-approximation algorithm for this problem when $\ell$ and $\varepsilon$ are constants but $k$ can be as large as $\Omega(n)$. We obtain our result by introducing a new dimension reduction technique for discrete Fr\'echet distance and then adapt an algorithm of Cohen-Addad et al. (J. ACM 2021) to work on the dimension-reduced input. As a byproduct we also improve the best coreset construction for $(k,\ell)$-median under discrete Fr\'echet distance (Cohen-Addad et al., SODA 2025) and show that its size can be independent of the number of input time series \emph{ and } their complexity.
翻译:暂无翻译