In this work, we propose an efficient two-stage algorithm solving a joint problem of correlation detection and permutation recovery between two Gaussian databases. Correlation detection is an hypothesis testing problem; under the null hypothesis, the databases are independent, and under the alternate hypothesis, they are correlated, under an unknown row permutation. We develop relatively tight bounds on the type-I and type-II error probabilities, and show that the analyzed detector performs better than a recently proposed detector, at least for some specific parameter choices. Since the proposed detector relies on a statistic, which is a sum of dependent indicator random variables, then in order to bound the type-I probability of error, we develop a novel graph-theoretic technique for bounding the $k$-th order moments of such statistics. When the databases are accepted as correlated, the algorithm also outputs an estimation for the underlying row permutation. By comparing to known converse results for this problem, we prove that the alignment error probability converges to zero under the asymptotically lowest possible correlation coefficient.
翻译:在这项工作中,我们提出一个高效的两阶段算法,解决两个Gaussian数据库之间相互关联的探测和变异恢复的共同问题。关联检测是一个假设测试问题;在无效假设下,数据库是独立的,在另一个假设下,它们是相互关联的,在未知的行内变异。我们在I型和II型误差概率上开发了相对紧凑的界限,并显示分析的探测器比最近提议的探测器表现得更好,至少在某些特定的参数选择方面是如此。由于拟议的探测器依赖于统计数据,这是依赖指标随机变量的总和,然后为了约束类型I的误差概率,我们开发了一种新的图形理论技术,以约束此类统计数据的美元顺序。当数据库被接受为关联时,算法还输出了对基本行的偏差的估计值。通过比较已知的反差结果,我们证明调整误差概率在尽可能最低的关联系数下会达到零。