Subtrajectory clustering is an important variant of the trajectory clustering problem, where the start and endpoints of trajectory patterns within the collected trajectory data are not known in advance. We study this problem in the form of a set cover problem for a given polygonal curve: find the smallest number $k$ of representative curves such that any point on the input curve is contained in a subcurve that has Fr\'echet distance at most a given $\Delta$ to a representative curve. We focus on the case where the representative curves are line segments and approach this NP-hard problem with classical techniques from the area of geometric set cover: we use a variant of the multiplicative weights update method which was first suggested by Br\"onniman and Goodrich for set cover instances with small VC-dimension. We obtain a bicriteria-approximation algorithm that computes a set of $O(k\log(k))$ line segments that cover a given polygonal curve of $n$ vertices under Fr\'echet distance at most $O(\Delta)$. We show that the algorithm runs in $\widetilde{O}(k^2 n + k n^3)$ time in expectation and uses $ \widetilde{O}(k n + n^3)$ space. For two dimensional input curves that are $c$-packed, we bound the expected running time by $\widetilde{O}(k^2 c^2 n)$ and the space by $ \widetilde{O}(kn + c^2 n)$. In $\mathbb{R}^d$ the dependency on $n$ instead is quadratic. In addition, we present a variant of the algorithm that uses implicit weight updates on the candidate set and thereby achieves near-linear running time in $n$ without any assumptions on the input curve, while keeping the same approximation bounds. This comes at the expense of a small (polylogarithmic) dependency on the relative arclength.
翻译:子轨道群集是轨迹群集问题的一个重要变量, 其中代表曲线为线性区块, 且在所收集的轨迹数据中的轨迹模式的起始点和终点 。 我们研究这一问题, 以一组覆盖给定多边形曲线的问题的形式 : 找到代表曲线中最小的数字 $k$, 这样输入曲线中的任何点都包含在子曲中, 该子曲的距离最多为给定的 $\ Delta 至代表曲线 。 我们关注的是代表曲线是线性区块, 并使用经典的 ndia2 线段处理这个 NP- hard 问题 : 我们使用一个多倍重重更新方法的变量, 这是Br\ $ onnimanman 和 Goodrich 首次建议的 以小 VC - dmension 来覆盖。 我们获得一个双标准- ac- a controg- a $( k) comploc) 线段, 以给定的 $n- democial clodeal clodeal 值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值内,, 也就是=xxxxx=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx