We study variants of the mean problem under the $p$-Dynamic Time Warping ($p$-DTW) distance, a popular and robust distance measure for sequential data. In our setting we are given a set of finite point sequences over an arbitrary metric space and we want to compute a mean point sequence of given length that minimizes the sum of $p$-DTW distances, each raised to the $q$th power, between the input sequences and the mean sequence. In general, the problem is $\mathrm{NP}$-hard and known not to be fixed-parameter tractable in the number of sequences. We show that it is even hard to approximate within any constant factor unless $\mathrm{P} = \mathrm{NP}$ and moreover if there exists a $\delta>0$ such that there is a $(\log n)^{\delta}$-approximation algorithm for DTW mean then $\mathrm{NP} \subseteq \mathrm{QP}$. On the positive side, we show that restricting the length of the mean sequence significantly reduces the hardness of the problem. We give an exact algorithm running in polynomial time for constant-length means. We explore various approximation algorithms that provide a trade-off between the approximation factor and the running time. Our approximation algorithms have a running time with only linear dependency on the number of input sequences. In addition, we use our mean algorithms to obtain clustering algorithms with theoretical guarantees.
翻译:我们研究在美元- 动态时间转换( p- DTW) 距离下的平均问题变式。 这个问题一般是 $\ mathrm{ $- DTW) 的硬值和已知在序列数中无法固定的线性参数。 在我们的设置中, 我们得到一套任意的衡量空间的定点序列, 我们想要计算一个平均点序列的给定时间长度, 以最小化美元- DTW 距离的总和, 每一个输入序列和平均序列之间都升至 $q美元。 一般来说, 问题在于 $\ mathrm{ NP} 硬值和已知在序列数中无法固定的线性参数。 我们显示, 除非$\ mathrm{ P} =\ =\ mathrm{ NP} = 美元, 并且如果存在 $\ delta0 > 美元, 则输入 $ (log n) delta} $- a addroprocx logation ral logy ral ral ral ral ral ral, 我们在运行中会用一个持续的轨算法 。 我们在运行中, 我们在运行时序中会显示一个持续的周期中, 我们用一个持续的轨算法 。