We consider the problem of finding the matching map between two sets of $d$ dimensional vectors from noisy observations, where the second set contains outliers. The matching map is then an injection, which can be consistently estimated only if the vectors of the second set are well separated. The main result shows that, in the high-dimensional setting, a detection region of unknown injection can be characterized by the sets of vectors for which the inlier-inlier distance is of order at least $d^{1/4}$ and the inlier-outlier distance is of order at least $d^{1/2}$. These rates are achieved using the estimated matching minimizing the sum of logarithms of distances between matched pairs of points. We also prove lower bounds establishing optimality of these rates. Finally, we report results of numerical experiments on both synthetic and real world data that illustrate our theoretical results and provide further insight into the properties of the estimators studied in this work.
翻译:我们考虑了从噪音观测中找到两组以美元计维矢量的匹配地图的问题,第二组含有外向矢量,第二组含有外向量。相应的地图是注入,只有将第二组的矢量完全分开,才能持续估计。主要结果显示,在高维环境中,未知注入的检测区域可以用几组矢量特征来描述,对于这些矢量,内向内距离至少需要$d ⁇ 1/4美元,而内向外距离至少需要$d ⁇ 1/2美元。这些比率是利用估计的匹配来达到的,以最小化匹配匹配匹配对等点之间距离的对数之和。我们还证明,较低界限可以确定这些速率的最佳性。最后,我们报告合成数据和实际世界数据的数字实验结果,以说明我们的理论结果,并进一步了解在这项工作中研究的测算器的特性。