This paper is concerned with matching feature vectors in a one-to-one fashion across large collections of datasets. Formulating this task as a multidimensional assignment problem with decomposable costs (MDADC), we develop extremely fast algorithms with time complexity linear in the number $n$ of datasets and space complexity a small fraction of the data size. These remarkable properties hinge on using the squared Euclidean distance as dissimilarity function, which can reduce ${n \choose 2}$ matching problems between pairs of datasets to $n$ problems and enable calculating assignment costs on the fly. To our knowledge, no other method applicable to the MDADC possesses these linear scaling and low-storage properties necessary to large-scale applications. In numerical experiments, the novel algorithms outperform competing methods and show excellent computational and optimization performances. An application of feature matching to a large neuroimaging database is presented. The algorithms of this paper are implemented in the R package matchFeat available at https://github.com/ddegras/matchFeat.
翻译:本文关注在大型数据集集中以一对一的方式匹配特性矢量。 将此任务描述为一个多维任务分配问题, 且具有可分解的成本( MDADC), 我们开发了极快的算法, 其时间复杂线性在数据集数量和空间复杂度方面是数据大小的一小部分。 这些显著的特性取决于将平方的欧几里德距离作为差异功能, 这可以将数据集对对齐的匹配问题减为1美元, 并使得能够计算苍蝇上的分配成本。 据我们所知, MDADC没有其它适用于大型应用程序所需的直线缩缩缩缩缩和低存储属性的方法。 在数字实验中, 新的算法超越了竞争方法, 并展示了出色的计算和优化性能。 演示了与大型神经成像数据库相匹配的特性应用程序。 此文件的算法在https://github. com/ddegras/matchFeat的R 软件包中实施。