We consider the problem of finding the matching map between two sets of $d$-dimensional noisy feature-vectors. The distinctive feature of our setting is that we do not assume that all the vectors of the first set have their corresponding vector in the second set. If $n$ and $m$ are the sizes of these two sets, we assume that the matching map that should be recovered is defined on a subset of unknown cardinality $k^*\le \min(n,m)$. We show that, in the high-dimensional setting, if the signal-to-noise ratio is larger than $5(d\log(4nm/\alpha))^{1/4}$, then the true matching map can be recovered with probability $1-\alpha$. Interestingly, this threshold does not depend on $k^*$ and is the same as the one obtained in prior work in the case of $k = \min(n,m)$. The procedure for which the aforementioned property is proved is obtained by a data-driven selection among candidate mappings $\{\hat\pi_k:k\in[\min(n,m)]\}$. Each $\hat\pi_k$ minimizes the sum of squares of distances between two sets of size $k$. The resulting optimization problem can be formulated as a minimum-cost flow problem, and thus solved efficiently. Finally, we report the results of numerical experiments on both synthetic and real-world data that illustrate our theoretical results and provide further insight into the properties of the algorithms studied in this work.
翻译:我们考虑在两组美元维度噪音特性矢量之间找到匹配地图的问题。 我们设置的特征是, 我们不认为第一组的所有矢量在第二组中都有相应的矢量。 如果美元和美元是这两组的大小, 我们假设, 应该在两个未知的基数基基数 $k<unk> le\ min( n, m) 的子集上定义应回收的匹配地图。 我们显示, 在高维环境中, 如果信号对音量比大于5美元( dlog( 4nm/\ alpha) ), 那么真正的匹配地图可以在第二组中找到相应的矢量 $\\ alpha$。 有趣的是, 这个阈值并不取决于$k<unk> =\ min( n, m) 美元。 在高维度设置中, 上述属性得到证明的程序是通过在候选人绘图 $\\ phik: k\ in $: licrial\ prealalalalal resulation yal max 中进一步选择数据来恢复真实匹配的地图。 因此, legal_ dalmaxal romodeal_ dexal coal pass pass romode room room romobildal rodudududududulem) 和我们提出一个最小的最小和两个解算法问题。</s>