Measures of similarity (or dissimilarity) are a key ingredient to many machine learning algorithms. We introduce DID, a pairwise dissimilarity measure applicable to a wide range of data spaces, which leverages the data's internal structure to be invariant to diffeomorphisms. We prove that DID enjoys properties which make it relevant for theoretical study and practical use. By representing each datum as a function, DID is defined as the solution to an optimization problem in a Reproducing Kernel Hilbert Space and can be expressed in closed-form. In practice, it can be efficiently approximated via Nystr\"om sampling. Empirical experiments support the merits of DID.
翻译:类似(或不同)的测量是许多机器学习算法的一个关键要素。 我们引入了DAD, 这是适用于广泛数据空间的双相异的测量方法, 使数据的内部结构与异变性具有内在的内在结构。 我们证明DAD具有与理论研究和实际应用相关的属性。 通过将每个数据作为函数来代表, 将DED定义为在复制 Kernel Hilbert 空间中最优化问题的解决办法, 可以用封闭形式表达。 实际上, 它可以通过 Nystr\\'om 取样有效地接近于该数据的内部结构。 经验实验支持DDD的优点 。