Existing measures and representations for trajectories have two longstanding fundamental shortcomings, i.e., they are computationally expensive and they can not guarantee the `uniqueness' property of a distance function: dist(X,Y) = 0 if and only if X=Y, where $X$ and $Y$ are two trajectories. This paper proposes a simple yet powerful way to represent trajectories and measure the similarity between two trajectories using a distributional kernel to address these shortcomings. It is a principled approach based on kernel mean embedding which has a strong theoretical underpinning. It has three distinctive features in comparison with existing approaches. (1) A distributional kernel is used for the very first time for trajectory representation and similarity measurement. (2) It does not rely on point-to-point distances which are used in most existing distances for trajectories. (3) It requires no learning, unlike existing learning and deep learning approaches. We show the generality of this new approach in three applications: (a) trajectory anomaly detection, (b) anomalous sub-trajectory detection, and (c) trajectory pattern mining. We identify that the distributional kernel has (i) a unique data-dependent property and the above uniqueness property which are the key factors that lead to its superior task-specific performance; and (ii) runtime orders of magnitude faster than existing distance measures.
翻译:轨迹的现有措施和表示有两个长期的基本缺陷,即:它们计算成本昂贵,无法保证距离函数的“不统一”属性:dist(X,Y)=0,如果而且只有在X=Y的情况下,它才使用分布内核,其中X美元和Y美元是两个轨迹。本文件提出一种简单而有力的方法来代表轨迹,并用分布式内核衡量两个轨迹之间的相似性。这是一种基于内核平均嵌入的原则性办法,具有很强的理论基础。与现有办法相比,它具有三个不同的特点:(1) 分配内核是第一次用于轨迹表示和类似度测量的X=Y,其中X美元和Y美元是两个轨迹。(2) 它不依赖在多数现有距离中使用的点到点距离的距离来代表轨迹,与现有的学习和深层次学习方法不同。我们在三种应用中显示了这种新办法的普遍性:(a) 轨迹异常检测,(b) 反常态嵌入二) 与现有轨迹上的主要分布方式(我们发现和轨迹上的唯一的轨迹特征)。