Modern time series analysis requires the ability to handle datasets that are inherently high-dimensional; examples include applications in climatology, where measurements from numerous sensors must be taken into account, or inventory tracking of large shops, where the dimension is defined by the number of tracked items. The standard way to mitigate computational issues arising from the high-dimensionality of the data is by applying some dimension reduction technique that preserves the structural properties of the ambient space. The dissimilarity between two time series is often measured by ``discrete'' notions of distance, e.g. the dynamic time warping, or the discrete Fr\'echet distance, or simply the Euclidean distance. Since all these distance functions are computed directly on the points of a time series, they are sensitive to different sampling rates or gaps. The continuous Fr\'echet distance offers a popular alternative which aims to alleviate this by taking into account all points on the polygonal curve obtained by linearly interpolating between any two consecutive points in a sequence. We study the ability of random projections \`a la Johnson and Lindenstrauss to preserve the continuous Fr\'echet distance of polygonal curves by effectively reducing the dimension. In particular, we show that one can reduce the dimension to $O(\epsilon^{-2} \log N)$, where $N$ is the total number of input points while preserving the continuous Fr\'echet distance between any two determined polygonal curves within a factor of $1\pm \epsilon$. We conclude with applications on clustering.
翻译:现代时间序列分析要求有能力处理本质上是高维的数据集; 示例包括气候学应用,其中必须考虑到来自众多传感器的测量结果,或大型商店的库存跟踪,其尺寸由跟踪的物品数量来界定。 减轻数据高度产生的计算问题的标准方法是应用某种维度减少技术来保持环境空间的结构特性。 两个时间序列之间的差异往往通过“分解”的距离概念来测量,例如动态时间扭曲,或离散的Fr\'echet距离,或仅仅是Euclidean距离。由于所有这些距离函数都是在时间序列点上直接计算出来的,因此对不同的取样率或差距十分敏感。 连续的Fr\'echet距离提供了一种受欢迎的替代方法,目的是通过考虑到通过线性间插在任何序列中的任何两个连续点之间测得的多边曲线上的所有点。 我们研究在一个连续的硬朗和Lindenstra$的距离上随机预测“ la car $ ” 和“ linderensteal” 等值的能力, 从而有效地保持一个连续的Fr\\\\ roqelus 的磁度, 我们的连续的直径, 。