This paper introduces kdiff, a novel kernel-based measure for estimating distances between instances of time series, random fields and other forms of structured data. This measure is based on the idea of matching distributions that only overlap over a portion of their region of support. Our proposed measure is inspired by MPdist which has been previously proposed for such datasets and is constructed using Euclidean metrics, whereas kdiff is constructed using non-linear kernel distances. Also, kdiff accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution. Comparing the cross similarity to self similarity allows for measures of similarity that are more robust to noise and partial occlusions of the relevant signals. Our proposed measure kdiff is a more general form of the well known kernel-based Maximum Mean Discrepancy (MMD) distance estimated over the embeddings. Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems where the embedding distributions can be modeled as two component mixtures. Applications are demonstrated for clustering of synthetic and real-life time series and image data, and the performance of kdiff is compared to competing distance measures for clustering.
翻译:本文引入了 kdiff, 这是用于估计时间序列、 随机字段和其他形式结构化数据之间距离的新型内核测量方法。 此项测量方法基于匹配分布的理念, 仅与其支持区域的一部分相重叠。 我们提议的测量方法受MPdist 的启发, 之前曾为这类数据集提议过, 使用 Euclidean 度量构建, 而 kdiff 是使用非线性内核距离构建的 。 另外, kdiff 测量了各实例之间的自相和交叉相似性, 并且使用距离分布的低四分位进行定义 。 比较自相近性与自相近性, 允许测量相近性, 与相关信号的噪音和部分封闭性更强。 我们提议的测量方法是由MPdistist 启发的, 之前曾为此类数据集中以 Euclideidean 为基础的最大中值差异度( MMD) 估计的距离, 而 kdiff 则使用 kdiff 来计算分离性条件的理论结果, 使用 kdiff 的远程测量和分类问题, 将嵌入分布分布分布分布作为模型作为两个组件的模型的模型的模型, 和图像的相近性模型的模型显示。